Artificial Intelligence
Authors and titles for cs.AI in Jan 2024
[ total of 1917 entries: 1-1917 ][ showing 1917 entries per page: fewer | more ]
- [1] arXiv:2401.00004 [ pdf , ps , other ]
-
Title: Informational non-reductionist theory of consciousness that providing maximum accuracy of reality predictionAuthors: E.E. VityaevComments: 14 pages, 7 figuresSubjects: Artificial Intelligence (cs.AI) ; Neurons and Cognition (q-bio.NC)
Abstract: The paper considers a non-reductionist theory of consciousness, which is not reducible to theories of reality and to physiological or psychological theories. Following D.I.Dubrovsky's "informational approach" to the "Mind-Brain Problem", we consider the reality through the prism of information about observed phenomena, which, in turn, is perceived by subjective reality through sensations, perceptions, feelings, etc., which, in turn, are information about the corresponding brain processes. Within this framework the following principle of the Information Theory of Consciousness (ITS) development is put forward: the brain discovers all possible causal relations in the external world and makes all possible inferences by them. The paper shows that ITS built on this principle: (1) also base on the information laws of the structure of external world; (2) explains the structure and functioning of the brain functional systems and cellular ensembles; (3) ensures maximum accuracy of predictions and the anticipation of reality; (4) resolves emerging contradictions and (5) is an information theory of the brain's reflection of reality.
- [2] arXiv:2401.00005 [ pdf , ps , other ]
-
Title: Consciousness as a logically consistent and prognostic model of realityAuthors: Evgenii VityaevComments: 22 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: The work demonstrates that brain might reflect the external world causal relationships in the form of a logically consistent and prognostic model of reality, which shows up as consciousness. The paper analyses and solves the problem of statistical ambiguity and provides a formal model of causal relationships as probabilistic maximally specific rules. We suppose that brain makes all possible inferences from causal relationships. We prove that the suggested formal model has a property of an unambiguous inference: from consistent premises we infer a consistent conclusion. It enables a set of all inferences to form a consistent model of the perceived world. Causal relationships may create fixed points of cyclic inter-predictable properties. We consider the "natural" classification introduced by John St. Mill and demonstrate that a variety of fixed points of the objects' attributes forms a "natural" classification of the external world. Then we consider notions of "natural" categories and causal models of categories, introduced by Eleanor Rosch and Bob Rehder and demonstrate that fixed points of causal relationships between objects attributes, which we perceive, formalize these notions. If the "natural" classification describes the objects of the external world, and "natural" concepts the perception of these objects, then the theory of integrated information, introduced by G. Tononi, describes the information processes of the brain for "natural" concepts formation that reflects the "natural" classification. We argue that integrated information provides high accuracy of the objects identification. A computer-based experiment is provided that illustrates fixed points formation for coded digits.
- [3] arXiv:2401.00006 [ pdf , other ]
-
Title: Building Open-Ended Embodied Agent via Language-Policy Bidirectional AdaptationAuthors: Shaopeng Zhai , Jie Wang , Tianyi Zhang , Fuxian Huang , Qi Zhang , Ming Zhou , Jing Hou , Yu Qiao , Yu LiuSubjects: Artificial Intelligence (cs.AI)
Abstract: Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterpart, limiting exploration of novel skills and hindering the efficacy of human-AI interaction. To this end, we present OpenPAL, a co-training framework comprising two stages: (1) fine-tuning a pre-trained LLM to translate human instructions into goals for planning, and goal-conditioned training a policy for decision-making; (2) co-training to align the LLM and policy, achieving instruction open-endedness. We conducted experiments using Contra, an open-ended FPS game, demonstrating that an agent trained with OpenPAL not only comprehends arbitrary instructions but also exhibits efficient execution. These results suggest that OpenPAL holds the potential to construct open-ended embodied agents in practical scenarios.
- [4] arXiv:2401.00007 [ pdf , ps , other ]
-
Title: Modeling arousal potential of epistemic emotions using Bayesian information gain: Inquiry cycle driven by free energy fluctuationsSubjects: Artificial Intelligence (cs.AI) ; Information Theory (cs.IT); Neurons and Cognition (q-bio.NC); Applications (stat.AP)
Abstract: Epistemic emotions, such as curiosity and interest, drive the inquiry process. This study proposes a novel formulation of epistemic emotions such as curiosity and interest using two types of information gain generated by the principle of free energy minimization: Kullback-Leibler divergence(KLD) from Bayesian posterior to prior, which represents free energy reduction in recognition, and Bayesian surprise (BS), which represents the expected information gain by Bayesian prior update. By applying a Gaussian generative model with an additional uniform likelihood, we found that KLD and BS form an upward-convex function of surprise (minimized free energy and prediction error), similar to Berlyne's arousal potential functions, or the Wundt curve. We consider that the alternate maximization of BS and KLD generates an ideal inquiry cycle to approach the optimal arousal level with fluctuations in surprise, and that curiosity and interest drive to facilitate the cyclic process. We exhaustively analyzed the effects of prediction uncertainty (prior variance) and observation uncertainty (likelihood variance) on the peaks of the information gain function as optimal surprises. The results show that greater prediction uncertainty, meaning an open-minded attitude, and less observational uncertainty, meaning precise observation with attention, are expected to provide greater information gains through a greater range of exploration. The proposed mathematical framework unifies the free energy principle of the brain and the arousal potential theory to explain the Wundt curve as an information gain function and suggests an ideal inquiry process driven by epistemic emotions.
- [5] arXiv:2401.00009 [ pdf , other ]
-
Title: Turing's Test, a Beautiful Thought ExperimentAuthors: Bernardo GonçalvesComments: 8.4K words, 9 pages, 3 figures, 3 text boxes, few minor correctionsSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: In the wake of large language models, there has been a resurgence of claims and questions about the Turing test and its value for AI, which are reminiscent of decades of practical "Turing" tests. If AI were quantum physics, by now several "Schrödinger's" cats could have been killed. Better late than never, it is time for a historical reconstruction of Turing's beautiful thought experiment. In this paper I present a wealth of evidence, including new archival sources, give original answers to several open questions about Turing's 1950 paper, and address the core question of the value of Turing's test.
- [6] arXiv:2401.00020 [ pdf , other ]
-
Title: AI-driven platform for systematic nomenclature and intelligent knowledge acquisition of natural medicinal materialsComments: 55 pages, 7 figures, 10 supplementary figures, 1 table, 1 supplementary tableSubjects: Artificial Intelligence (cs.AI) ; Databases (cs.DB); Information Retrieval (cs.IR)
Abstract: Natural Medicinal Materials (NMMs) have a long history of global clinical applications, accompanied by extensive informational records. Despite their significant impact on healthcare, the field faces a major challenge: the non-standardization of NMM knowledge, stemming from historical complexities and causing limitations in broader applications. To address this, we introduce a Systematic Nomenclature for NMMs, underpinned by ShennongAlpha, an AI-driven platform designed for intelligent knowledge acquisition. This nomenclature system enables precise identification and differentiation of NMMs. ShennongAlpha, cataloging over ten thousand NMMs with standardized bilingual information, enhances knowledge management and application capabilities, thereby overcoming traditional barriers. Furthermore, it pioneers AI-empowered conversational knowledge acquisition and standardized machine translation. These synergistic innovations mark the first major advance in integrating domain-specific NMM knowledge with AI, propelling research and applications across both NMM and AI fields while establishing a groundbreaking precedent in this crucial area.
- [7] arXiv:2401.00033 [ pdf , other ]
-
Title: Hybrid Modeling Design PatternsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Design patterns provide a systematic way to convey solutions to recurring modeling challenges. This paper introduces design patterns for hybrid modeling, an approach that combines modeling based on first principles with data-driven modeling techniques. While both approaches have complementary advantages there are often multiple ways to combine them into a hybrid model, and the appropriate solution will depend on the problem at hand. In this paper, we provide four base patterns that can serve as blueprints for combining data-driven components with domain knowledge into a hybrid approach. In addition, we also present two composition patterns that govern the combination of the base patterns into more complex hybrid models. Each design pattern is illustrated by typical use cases from application areas such as climate modeling, engineering, and physics.
- [8] arXiv:2401.00062 [ pdf , other ]
-
Title: Semantic Computing for Organizational Effectiveness: From Organization Theory to Practice through Semantics-Based ModellingSubjects: Artificial Intelligence (cs.AI)
Abstract: A critical function of an organization is to foster the level of integration (coordination and cooperation) necessary to achieve its objectives. The need to coordinate and motivation to cooperate emerges from the myriad dependencies between an organization's members and their work. Therefore, to reason about solutions to coordination and cooperation problems requires a robust representation that includes the underlying dependencies. We find that such a representation remains missing from formal organizational models, and we leverage semantics to bridge this gap. Drawing on well-established organizational research and our extensive fieldwork with one of North America's largest municipalities, (1) we introduce an ontology, formalized in first-order logic, that operationalizes concepts like outcome, reward, and epistemic dependence, and their links to potential integration risks; and (2) present real-world applications of this ontology to analyze and support integration in complex government infrastructure projects. Our ontology is implemented and validated in both Z3 and OWL. Key features of our model include inferable dependencies, explainable coordination and cooperation risks, and actionable insights on how dependency structures within an organization can be altered to mitigate the risks. Conceptualizing real-world challenges like incentive misalignment, free-riding, and subgoal optimization in terms of dependency structures, our semantics-based approach represents a novel method for modelling and enhancing coordination and cooperation. Integrated within a decision-support system, our model may serve as an impactful aid for organizational design and effectiveness. More broadly, our approach underscores the transformative potential of semantics in deriving tangible, real-world value from existing organization theory.
- [9] arXiv:2401.00125 [ pdf , other ]
-
Title: LLM-Assist: Enhancing Closed-Loop Planning with Language-Based ReasoningComments: 15 pages, 8 figures, 7 tablesSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV)
Abstract: Although planning is a crucial component of the autonomous driving stack, researchers have yet to develop robust planning algorithms that are capable of safely handling the diverse range of possible driving scenarios. Learning-based planners suffer from overfitting and poor long-tail performance. On the other hand, rule-based planners generalize well, but might fail to handle scenarios that require complex driving maneuvers. To address these limitations, we investigate the possibility of leveraging the common-sense reasoning capabilities of Large Language Models (LLMs) such as GPT4 and Llama2 to generate plans for self-driving vehicles. In particular, we develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner. Guided by commonsense reasoning abilities of LLMs, our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach. Through extensive evaluation on the nuPlan benchmark, we achieve state-of-the-art performance, outperforming all existing pure learning- and rule-based methods across most metrics. Our code will be available at this https URL .
- [10] arXiv:2401.00139 [ pdf , other ]
-
Title: Is Knowledge All Large Language Models Needed for Causal Reasoning?Comments: A Python implementation of our proposed method is available at this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG); Methodology (stat.ME)
Abstract: This paper explores the causal reasoning of large language models (LLMs) to enhance their interpretability and reliability in advancing artificial intelligence. Despite the proficiency of LLMs in a range of tasks, their potential for understanding causality requires further exploration. We propose a novel causal attribution model that utilizes "do-operators" for constructing counterfactual scenarios, allowing us to systematically quantify the influence of input numerical data and LLMs' pre-existing knowledge on their causal reasoning processes. Our newly developed experimental setup assesses LLMs' reliance on contextual information and inherent knowledge across various domains. Our evaluation reveals that LLMs' causal reasoning ability depends on the context and domain-specific knowledge provided, and supports the argument that "knowledge is, indeed, what LLMs principally require for sound causal reasoning". On the contrary, in the absence of knowledge, LLMs still maintain a degree of causal reasoning using the available numerical data, albeit with limitations in the calculations.
- [11] arXiv:2401.00211 [ pdf , other ]
-
Title: Open-TI: Open Traffic Intelligence with Augmented Language ModelAuthors: Longchao Da , Kuanru Liou , Tiejin Chen , Xuesong Zhou , Xiangyong Luo , Yezhou Yang , Hua WeiComments: 22 pages main content, 8 pages appendixSubjects: Artificial Intelligence (cs.AI)
Abstract: Transportation has greatly benefited the cities' development in the modern civilization process. Intelligent transportation, leveraging advanced computer algorithms, could further increase people's daily commuting efficiency. However, intelligent transportation, as a cross-discipline, often requires practitioners to comprehend complicated algorithms and obscure neural networks, bringing a challenge for the advanced techniques to be trusted and deployed in practical industries. Recognizing the expressiveness of the pre-trained large language models, especially the potential of being augmented with abilities to understand and execute intricate commands, we introduce Open-TI. Serving as a bridge to mitigate the industry-academic gap, Open-TI is an innovative model targeting the goal of Turing Indistinguishable Traffic Intelligence, it is augmented with the capability to harness external traffic analysis packages based on existing conversations. Marking its distinction, Open-TI is the first method capable of conducting exhaustive traffic analysis from scratch - spanning from map data acquisition to the eventual execution in complex simulations. Besides, Open-TI is able to conduct task-specific embodiment like training and adapting the traffic signal control policies (TSC), explore demand optimizations, etc. Furthermore, we explored the viability of LLMs directly serving as control agents, by understanding the expected intentions from Open-TI, we designed an agent-to-agent communication mode to support Open-TI conveying messages to ChatZero (control agent), and then the control agent would choose from the action space to proceed the execution. We eventually provide the formal implementation structure, and the open-ended design invites further community-driven enhancements.
- [12] arXiv:2401.00298 [ pdf , ps , other ]
-
Title: Principal-Agent Reward Shaping in MDPsComments: Full version of a paper accepted to AAAI'24Subjects: Artificial Intelligence (cs.AI)
Abstract: Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal's utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their reward, which is the sum of the original and the offered reward. Our results establish the NP-hardness of the problem and offer polynomial approximation algorithms for two classes of instances: Stochastic trees and deterministic decision processes with a finite horizon.
- [13] arXiv:2401.00315 [ pdf , other ]
-
Title: Bidirectional Temporal Plan Graph: Enabling Switchable Passing Orders for More Efficient Multi-Agent Path Finding Plan ExecutionComments: Accepted by AAAI-2024Subjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA); Robotics (cs.RO)
Abstract: The Multi-Agent Path Finding (MAPF) problem involves planning collision-free paths for multiple agents in a shared environment. The majority of MAPF solvers rely on the assumption that an agent can arrive at a specific location at a specific timestep. However, real-world execution uncertainties can cause agents to deviate from this assumption, leading to collisions and deadlocks. Prior research solves this problem by having agents follow a Temporal Plan Graph (TPG), enforcing a consistent passing order at every location as defined in the MAPF plan. However, we show that TPGs are overly strict because, in some circumstances, satisfying the passing order requires agents to wait unnecessarily, leading to longer execution time. To overcome this issue, we introduce a new graphical representation called a Bidirectional Temporal Plan Graph (BTPG), which allows switching passing orders during execution to avoid unnecessary waiting time. We design two anytime algorithms for constructing a BTPG: BTPG-naïve and BTPG-optimized. Experimental results show that following BTPGs consistently outperforms following TPGs, reducing unnecessary waits by 8-20%.
- [14] arXiv:2401.00430 [ pdf , other ]
-
Title: Brain-Conditional Multimodal Synthesis: A Survey and TaxonomySubjects: Artificial Intelligence (cs.AI)
Abstract: In the era of Artificial Intelligence Generated Content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image, text-to-video, text-to-audio, etc) are gradually reshaping the natural content in the real world. The key to multimodal synthesis technology is to establish the mapping relationship between different modalities. Brain signals, serving as potential reflections of how the brain interprets external information, exhibit a distinctive One-to-Many correspondence with various external modalities. This correspondence makes brain signals emerge as a promising guiding condition for multimodal content synthesis. Brian-conditional multimodal synthesis refers to decoding brain signals back to perceptual experience, which is crucial for developing practical brain-computer interface systems and unraveling complex mechanisms underlying how the brain perceives and comprehends external stimuli. This survey comprehensively examines the emerging field of AIGC-based Brain-conditional Multimodal Synthesis, termed AIGC-Brain, to delineate the current landscape and future directions. To begin, related brain neuroimaging datasets, functional brain regions, and mainstream generative models are introduced as the foundation of AIGC-Brain decoding and analysis. Next, we provide a comprehensive taxonomy for AIGC-Brain decoding models and present task-specific representative work and detailed implementation strategies to facilitate comparison and in-depth analysis. Quality assessments are then introduced for both qualitative and quantitative evaluation. Finally, this survey explores insights gained, providing current challenges and outlining prospects of AIGC-Brain. Being the inaugural survey in this domain, this paper paves the way for the progress of AIGC-Brain research, offering a foundational overview to guide future work.
- [15] arXiv:2401.00544 [ pdf , other ]
-
Title: A Reliable Knowledge Processing Framework for Combustion Science using Foundation ModelsComments: 38 pages and 10 figures; Fixed figure resolutionSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: This research explores the integration of large language models (LLMs) into scientific data assimilation, focusing on combustion science as a case study. Leveraging foundational models integrated with Retrieval-Augmented Generation (RAG) framework, the study introduces an approach to process diverse combustion research data, spanning experimental studies, simulations, and literature. The multifaceted nature of combustion research emphasizes the critical role of knowledge processing in navigating and extracting valuable information from a vast and diverse pool of sources. The developed approach minimizes computational and economic expenses while optimizing data privacy and accuracy. It incorporates prompt engineering and offline open-source LLMs, offering user autonomy in selecting base models. The study provides a thorough examination of text segmentation strategies, conducts comparative studies between LLMs, and explores various optimized prompts to demonstrate the effectiveness of the framework. By incorporating an external database, the framework outperforms a conventional LLM in generating accurate responses and constructing robust arguments. Additionally, the study delves into the investigation of optimized prompt templates for the purpose of efficient extraction of scientific literature. The research addresses concerns related to hallucinations and false research articles by introducing a custom workflow developed with a detection algorithm to filter out inaccuracies. Despite identified areas for improvement, the framework consistently delivers accurate domain-specific responses with minimal human oversight. The prompt-agnostic approach introduced holds promise for future deliberations. The study underscores the significance of integrating LLMs and knowledge processing techniques in scientific research, providing a foundation for advancements in data assimilation and utilization.
- [16] arXiv:2401.00546 [ pdf , other ]
-
Title: AllSpark: A Multimodal Spatio-Temporal General Intelligence Model with Thirteen ModalitiesAuthors: Run Shao , Cheng Yang , Qiujun Li , Qing Zhu , Yongjun Zhang , YanSheng Li , Yu Liu , Yong Tang , Dapeng Liu , Shizhong Yang , Haifeng LiComments: 49 pages, 16 tables, 3 figuresSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: For a long time, due to the high heterogeneity in structure and semantics among various spatiotemporal modal data, the joint interpretation of multimodal spatiotemporal data has been an extremely challenging problem. The primary challenge resides in striking a trade-off between the cohesion and autonomy of diverse modalities, and this trade-off exhibits a progressively nonlinear nature as the number of modalities expands. We introduce the Language as Reference Framework (LaRF), a fundamental principle for constructing a multimodal unified model, aiming to strike a trade-off between the cohesion and autonomy among different modalities. We propose a multimodal spatiotemporal general artificial intelligence model, called AllSpark. Our model integrates thirteen different modalities into a unified framework, including 1D (text, code), 2D (RGB, infrared, SAR, multispectral, hyperspectral, tables, graphs, trajectory, oblique photography), and 3D (point clouds, videos) modalities. To achieve modal cohesion, AllSpark uniformly maps diverse modal features to the language modality. In addition, we design modality-specific prompts to guide multi-modal large language models in accurately perceiving multimodal data. To maintain modality autonomy, AllSpark introduces modality-specific encoders to extract the tokens of various spatiotemporal modalities. And modal bridge is employed to achieve dimensional projection from each modality to the language modality. Finally, observing a gap between the model's interpretation and downstream tasks, we designed task heads to enhance the model's generalization capability on specific downstream tasks. Experiments indicate that AllSpark achieves competitive accuracy in modalities such as RGB and trajectory compared to state-of-the-art models.
- [17] arXiv:2401.00588 [ pdf , other ]
-
Title: Fairness in Serving Large Language ModelsAuthors: Ying Sheng , Shiyi Cao , Dacheng Li , Banghua Zhu , Zhuohan Li , Danyang Zhuo , Joseph E. Gonzalez , Ion StoicaSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Performance (cs.PF)
Abstract: High-demand LLM inference services (e.g., ChatGPT and BARD) support a wide range of requests from short chat conversations to long document reading. To ensure that all client requests are processed fairly, most major LLM inference services have request rate limits, to ensure that no client can dominate the request queue. However, this rudimentary notion of fairness also results in under-utilization of the resources and poor client experience when there is spare capacity. While there is a rich literature on fair scheduling, serving LLMs presents new challenges due to their unpredictable request lengths and their unique batching characteristics on parallel accelerators. This paper introduces the definition of LLM serving fairness based on a cost function that accounts for the number of input and output tokens processed. To achieve fairness in serving, we propose a novel scheduling algorithm, the Virtual Token Counter (VTC), a fair scheduler based on the continuous batching mechanism. We prove a 2x tight upper bound on the service difference between two backlogged clients, adhering to the requirement of work-conserving. Through extensive experiments, we demonstrate the superior performance of VTC in ensuring fairness, especially in contrast to other baseline methods, which exhibit shortcomings under various conditions.
- [18] arXiv:2401.00832 [ pdf , other ]
-
Title: Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science EducationAuthors: Arne Bewersdorff , Christian Hartmann , Marie Hornberger , Kathrin Seßler , Maria Bannert , Enkelejda Kasneci , Gjergji Kasneci , Xiaoming Zhai , Claudia NerdelSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: The integration of Artificial Intelligence (AI), particularly Large Language Model (LLM)-based systems, in education has shown promise in enhancing teaching and learning experiences. However, the advent of Multimodal Large Language Models (MLLMs) like GPT-4 with vision (GPT-4V), capable of processing multimodal data including text, sound, and visual inputs, opens a new era of enriched, personalized, and interactive learning landscapes in education. Grounded in theory of multimedia learning, this paper explores the transformative role of MLLMs in central aspects of science education by presenting exemplary innovative learning scenarios. Possible applications for MLLMs could range from content creation to tailored support for learning, fostering competencies in scientific practices, and providing assessment and feedback. These scenarios are not limited to text-based and uni-modal formats but can be multimodal, increasing thus personalization, accessibility, and potential learning effectiveness. Besides many opportunities, challenges such as data protection and ethical considerations become more salient, calling for robust frameworks to ensure responsible integration. This paper underscores the necessity for a balanced approach in implementing MLLMs, where the technology complements rather than supplants the educator's role, ensuring thus an effective and ethical use of AI in science education. It calls for further research to explore the nuanced implications of MLLMs on the evolving role of educators and to extend the discourse beyond science education to other disciplines. Through the exploration of potentials, challenges, and future implications, we aim to contribute to a preliminary understanding of the transformative trajectory of MLLMs in science education and beyond.
- [19] arXiv:2401.00880 [ pdf , other ]
-
Title: Towards Bridging the Gap between High-Level Reasoning and Execution on RobotsAuthors: Till HofmannComments: PhD ThesisSubjects: Artificial Intelligence (cs.AI)
Abstract: When reasoning about actions, e.g., by means of task planning or agent programming with Golog, the robot's actions are typically modeled on an abstract level, where complex actions such as picking up an object are treated as atomic primitives with deterministic effects and preconditions that only depend on the current state. However, when executing such an action on a robot it can no longer be seen as a primitive. Instead, action execution is a complex task involving multiple steps with additional temporal preconditions and timing constraints. Furthermore, the action may be noisy, e.g., producing erroneous sensing results and not always having the desired effects. While these aspects are typically ignored in reasoning tasks, they need to be dealt with during execution. In this thesis, we propose several approaches towards closing this gap.
- [20] arXiv:2401.00996 [ pdf , other ]
-
Title: Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software DeploymentComments: Accepted by IEEE Transactions on Software Engineering (TSE). Camera-ready Version. arXiv admin note: substantial text overlap with arXiv:2208.05969Subjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Abstract: The size of deep learning models in artificial intelligence (AI) software is increasing rapidly, hindering the large-scale deployment on resource-restricted devices (e.g., smartphones). To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance. However, the intrinsic defects in a big model may be inherited by the compressed one. Such defects may be easily leveraged by adversaries, since a compressed model is usually deployed in a large number of devices without adequate protection. In this article, we aim to address the safe model compression problem from the perspective of safety-performance co-optimization. Specifically, inspired by the test-driven development (TDD) paradigm in software engineering, we propose a test-driven sparse training framework called SafeCompress. By simulating the attack mechanism as safety testing, SafeCompress can automatically compress a big model to a small one following the dynamic sparse training paradigm. Then, considering two kinds of representative and heterogeneous attack mechanisms, i.e., black-box membership inference attack and white-box membership inference attack, we develop two concrete instances called BMIA-SafeCompress and WMIA-SafeCompress. Further, we implement another instance called MMIA-SafeCompress by extending SafeCompress to defend against the occasion when adversaries conduct black-box and white-box membership inference attacks simultaneously. We conduct extensive experiments on five datasets for both computer vision and natural language processing tasks. The results show the effectiveness and generalizability of our framework. We also discuss how to adapt SafeCompress to other attacks besides membership inference attack, demonstrating the flexibility of SafeCompress.
- [21] arXiv:2401.01040 [ pdf , other ]
-
Title: Towards Cognitive AI Systems: a Survey and Prospective on Neuro-Symbolic AIAuthors: Zishen Wan , Che-Kai Liu , Hanchen Yang , Chaojian Li , Haoran You , Yonggan Fu , Cheng Wan , Tushar Krishna , Yingyan Lin , Arijit RaychowdhuryComments: Workshop on Systems for Next-Gen AI Paradigms, 6th Conference on Machine Learning and Systems (MLSys), June 4-8, 2023, Miami, FL, USASubjects: Artificial Intelligence (cs.AI) ; Hardware Architecture (cs.AR)
Abstract: The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, have significantly impacted various aspects of our lives. However, the current challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability call for the development of next-generation AI systems. Neuro-symbolic AI (NSAI) emerges as a promising paradigm, fusing neural, symbolic, and probabilistic approaches to enhance interpretability, robustness, and trustworthiness while facilitating learning from much less data. Recent NSAI systems have demonstrated great potential in collaborative human-AI scenarios with reasoning and cognitive capabilities. In this paper, we provide a systematic review of recent progress in NSAI and analyze the performance characteristics and computational operators of NSAI models. Furthermore, we discuss the challenges and potential future directions of NSAI from both system and architectural perspectives.
- [22] arXiv:2401.01425 [ pdf , other ]
-
Title: SwapTransformer: highway overtaking tactical planner model via imitation learning on OSHA datasetComments: 19 pages, 12 Figures, 1 Algorithm, 2 TablesSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: This paper investigates the high-level decision-making problem in highway scenarios regarding lane changing and over-taking other slower vehicles. In particular, this paper aims to improve the Travel Assist feature for automatic overtaking and lane changes on highways. About 9 million samples including lane images and other dynamic objects are collected in simulation. This data; Overtaking on Simulated HighwAys (OSHA) dataset is released to tackle this challenge. To solve this problem, an architecture called SwapTransformer is designed and implemented as an imitation learning approach on the OSHA dataset. Moreover, auxiliary tasks such as future points and car distance network predictions are proposed to aid the model in better understanding the surrounding environment. The performance of the proposed solution is compared with a multi-layer perceptron (MLP) and multi-head self-attention networks as baselines in a simulation environment. We also demonstrate the performance of the model with and without auxiliary tasks. All models are evaluated based on different metrics such as time to finish each lap, number of overtakes, and speed difference with speed limit. The evaluation shows that the SwapTransformer model outperforms other models in different traffic densities in the inference phase.
- [23] arXiv:2401.01459 [ pdf , other ]
-
Title: Outlier Ranking in Large-Scale Public Health StreamsAuthors: Ananya Joshi , Tina Townes , Nolan Gormley , Luke Neureiter , Roni Rosenfeld , Bryan WilderComments: 6 figures, 8 pagesSubjects: Artificial Intelligence (cs.AI)
Abstract: Disease control experts inspect public health data streams daily for outliers worth investigating, like those corresponding to data quality issues or disease outbreaks. However, they can only examine a few of the thousands of maximally-tied outliers returned by univariate outlier detection methods applied to large-scale public health data streams. To help experts distinguish the most important outliers from these thousands of tied outliers, we propose a new task for algorithms to rank the outputs of any univariate method applied to each of many streams. Our novel algorithm for this task, which leverages hierarchical networks and extreme value analysis, performed the best across traditional outlier detection metrics in a human-expert evaluation using public health data streams. Most importantly, experts have used our open-source Python implementation since April 2023 and report identifying outliers worth investigating 9.1x faster than their prior baseline. Other organizations can readily adapt this implementation to create rankings from the outputs of their tailored univariate methods across large-scale streams.
- [24] arXiv:2401.01596 [ pdf , other ]
-
Title: MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical QueriesAuthors: Akash Ghosh , Arkadeep Acharya , Prince Jha , Aniket Gaudgaul , Rajdeep Majumdar , Sriparna Saha , Aman Chadha , Raghav Jain , Setu Sinha , Shivani AgarwalComments: ECIR 2024Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization MMCQS dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient's medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available.
- [25] arXiv:2401.01620 [ pdf , other ]
-
Title: Large Language Model Capabilities in Perioperative Risk Prediction and PrognosticationAuthors: Philip Chung , Christine T Fong , Andrew M Walters , Nima Aghaeepour , Meliha Yetisgen , Vikas N O'Reilly-ShahSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: We investigate whether general-domain large language models such as GPT-4 Turbo can perform risk stratification and predict post-operative outcome measures using a description of the procedure and a patient's clinical notes derived from the electronic health record. We examine predictive performance on 8 different tasks: prediction of ASA Physical Status Classification, hospital admission, ICU admission, unplanned admission, hospital mortality, PACU Phase 1 duration, hospital duration, and ICU duration. Few-shot and chain-of-thought prompting improves predictive performance for several of the tasks. We achieve F1 scores of 0.50 for ASA Physical Status Classification, 0.81 for ICU admission, and 0.86 for hospital mortality. Performance on duration prediction tasks were universally poor across all prompt strategies. Current generation large language models can assist clinicians in perioperative risk stratification on classification tasks and produce high-quality natural language summaries and explanations.
- [26] arXiv:2401.01623 [ pdf , other ]
-
Title: Can AI Be as Creative as Humans?Authors: Haonan Wang , James Zou , Michael Mozer , Anirudh Goyal , Alex Lamb , Linjun Zhang , Weijie J Su , Zhun Deng , Michael Qizhe Xie , Hannah Brown , Kenji KawaguchiComments: The paper examines AI's creativity, introducing Relative and Statistical Creativity for theoretical and practical analysis, along with practical training guidelines. Project Page: ai-relative-creativity.github.ioSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators. Therefore, the debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data. To arrive at this conclusion, this paper first addresses the complexities in defining creativity by introducing a new concept called Relative Creativity. Rather than attempting to define creativity universally, we shift the focus to whether AI can match the creative abilities of a hypothetical human. The methodological shift leads to a statistically quantifiable assessment of AI's creativity, term Statistical Creativity. This concept, statistically comparing the creative abilities of AI with those of specific human groups, facilitates theoretical exploration of AI's creative potential. Our analysis reveals that by fitting extensive conditional data without marginalizing out the generative conditions, AI can emerge as a hypothetical new creator. The creator possesses the same creative abilities on par with the human creators it was trained on. Building on theoretical findings, we discuss the application in prompt-conditioned autoregressive models, providing a practical means for evaluating creative abilities of generative AI models, such as Large Language Models (LLMs). Additionally, this study provides an actionable training guideline, bridging the theoretical quantification of creativity with practical model training.
- [27] arXiv:2401.01630 [ pdf , other ]
-
Title: A Cybersecurity Risk Analysis Framework for Systems with Artificial Intelligence ComponentsComments: 54 pages, 18 tables, 6 figuresSubjects: Artificial Intelligence (cs.AI) ; Cryptography and Security (cs.CR); Applications (stat.AP)
Abstract: The introduction of the European Union Artificial Intelligence Act, the NIST Artificial Intelligence Risk Management Framework, and related norms demands a better understanding and implementation of novel risk analysis approaches to evaluate systems with Artificial Intelligence components. This paper provides a cybersecurity risk analysis framework that can help assessing such systems. We use an illustrative example concerning automated driving systems.
- [28] arXiv:2401.01753 [ src ]
-
Title: A Generative AI Assistant to Accelerate Cloud MigrationComments: arXiv admin comment: This version has been removed by arXiv administrators as the submitter did not have the rights to agree to the license at the time of submissionSubjects: Artificial Intelligence (cs.AI)
Abstract: We present a tool that leverages generative AI to accelerate the migration of on-premises applications to the cloud. The Cloud Migration LLM accepts input from the user specifying the parameters of their migration, and outputs a migration strategy with an architecture diagram. A user study suggests that the migration LLM can assist inexperienced users in finding the right cloud migration profile, while avoiding complexities of a manual approach.
- [29] arXiv:2401.01772 [ pdf , ps , other ]
-
Title: A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable StructureComments: 31 pages;Subjects: Artificial Intelligence (cs.AI) ; Networking and Internet Architecture (cs.NI)
Abstract: Artificial neural networks (ANNs) have permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, the inherent limitations of traditional neural networks arise due to their relatively fixed network structures and activation functions. 1, The type of activation function is single and relatively fixed, which leads to poor "unit representation ability" of the network, and it is often used to solve simple problems with very complex networks; 2, the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. To address the aforementioned issues, this study proposes a novel neural network called X-Net. By utilizing our designed Alternating Backpropagation mechanism, X-Net dynamically selects appropriate activation functions based on derivative information during training to enhance the network's representation capability for specific tasks. Simultaneously, it accurately adjusts the network structure at the neuron level to accommodate tasks of varying complexities and reduce computational costs. Through a series of experiments, we demonstrate the dual advantages of X-Net in terms of reducing model size and improving representation power. Specifically, in terms of the number of parameters, X-Net is only 3$\%$ of baselines on average, and only 1.4$\%$ under some tasks. In terms of representation ability, X-Net can achieve an average $R^2$=0.985 on the fitting task by only optimizing the activation function without introducing any parameters. Finally, we also tested the ability of X-Net to help scientific discovery on data from multiple disciplines such as society, energy, environment, and aerospace, and achieved concise and good results.
- [30] arXiv:2401.01814 [ pdf , other ]
-
Title: Large Language Models Relearn Removed ConceptsSubjects: Artificial Intelligence (cs.AI)
Abstract: Advances in model editing through neuron pruning hold promise for removing undesirable concepts from large language models. However, it remains unclear whether models have the capacity to reacquire pruned concepts after editing. To investigate this, we evaluate concept relearning in models by tracking concept saliency and similarity in pruned neurons during retraining. Our findings reveal that models can quickly regain performance post-pruning by relocating advanced concepts to earlier layers and reallocating pruned concepts to primed neurons with similar semantics. This demonstrates that models exhibit polysemantic capacities and can blend old and new concepts in individual neurons. While neuron pruning provides interpretability into model concepts, our results highlight the challenges of permanent concept removal for improved model \textit{safety}. Monitoring concept reemergence and developing techniques to mitigate relearning of unsafe concepts will be important directions for more robust model editing. Overall, our work strongly demonstrates the resilience and fluidity of concept representations in LLMs post concept removal.
- [31] arXiv:2401.01836 [ pdf , other ]
-
Title: Neural Control: Concurrent System Identification and Control Learning with Neural ODEAuthors: Cheng ChiComments: 9 pages, code open sourced in format of Google Colab notebooks; Resubmitted for adding missed references in the last submissionSubjects: Artificial Intelligence (cs.AI)
Abstract: Controlling continuous-time dynamical systems is generally a two step process: first, identify or model the system dynamics with differential equations, then, minimize the control objectives to achieve optimal control function and optimal state trajectories. However, any inaccuracy in dynamics modeling will lead to sub-optimality in the resulting control function. To address this, we propose a neural ODE based method for controlling unknown dynamical systems, denoted as Neural Control (NC), which combines dynamics identification and optimal control learning using a coupled neural ODE. Through an intriguing interplay between the two neural networks in coupled neural ODE structure, our model concurrently learns system dynamics as well as optimal controls that guides towards target states. Our experiments demonstrate the effectiveness of our model for learning optimal control of unknown dynamical systems. Codes available at this https URL
- [32] arXiv:2401.01841 [ pdf , other ]
-
Title: Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision ProcessesComments: Accepted for publication at the International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: A fundamental (and largely open) challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NSMDP). However, existing approaches for decision-making in NSMDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i.e., the agent acts ``safely'' to account for the non-stationary evolution of the environment. We argue that both these assumptions are invalid in practice -- updated environmental conditions are rarely known, and as the agent interacts with the environment, it can learn about the updated dynamics and avoid being pessimistic, at least in states whose dynamics it is confident about. We present a heuristic search algorithm called \textit{Adaptive Monte Carlo Tree Search (ADA-MCTS)} that addresses these challenges. We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic. To quantify ``updated knowledge,'' we disintegrate the aleatoric and epistemic uncertainty in the agent's updated belief and show how the agent can use these estimates for decision-making. We compare the proposed approach with the multiple state-of-the-art approaches in decision-making across multiple well-established open-source problems and empirically show that our approach is faster and highly adaptive without sacrificing safety.
- [33] arXiv:2401.02500 [ pdf , other ]
-
Title: On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)Authors: Vishal Pallagani , Kaushik Roy , Bharath Muppasani , Francesco Fabiano , Andrea Loreggia , Keerthiram Murugesan , Biplav Srivastava , Francesca Rossi , Lior Horesh , Amit ShethSubjects: Artificial Intelligence (cs.AI)
Abstract: Automated Planning and Scheduling is among the growing areas in Artificial Intelligence (AI) where mention of LLMs has gained popularity. Based on a comprehensive review of 126 papers, this paper investigates eight categories based on the unique applications of LLMs in addressing various aspects of planning problems: language translation, plan generation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. For each category, we articulate the issues considered and existing gaps. A critical insight resulting from our review is that the true potential of LLMs unfolds when they are integrated with traditional symbolic planners, pointing towards a promising neuro-symbolic approach. This approach effectively combines the generative aspects of LLMs with the precision of classical planning methods. By synthesizing insights from existing literature, we underline the potential of this integration to address complex planning challenges. Our goal is to encourage the ICAPS community to recognize the complementary strengths of LLMs and symbolic planners, advocating for a direction in automated planning that leverages these synergistic capabilities to develop more advanced and intelligent planning systems.
- [34] arXiv:2401.02549 [ pdf , ps , other ]
-
Title: Quantitative Technology Forecasting: a Review of Trend Extrapolation MethodsAuthors: Peng-Hung Tsai , Daniel Berleant , Richard S. Segall , Hyacinthe Aboudja , Venkata Jaipal R. Batthula , Sheela Duggirala , Michael HowellJournal-ref: International Journal of Innovation and Technology Management (2023), 20(4):2330002Subjects: Artificial Intelligence (cs.AI) ; Applications (stat.AP)
Abstract: Quantitative technology forecasting uses quantitative methods to understand and project technological changes. It is a broad field encompassing many different techniques and has been applied to a vast range of technologies. A widely used approach in this field is trend extrapolation. Based on the publications available to us, there has been little or no attempt made to systematically review the empirical evidence on quantitative trend extrapolation techniques. This study attempts to close this gap by conducting a systematic review of technology forecasting literature addressing the application of quantitative trend extrapolation techniques. We identified 25 studies relevant to the objective of this research and classified the techniques used in the studies into different categories, among which growth curves and time series methods were shown to remain popular over the past decade, while newer methods, such as machine learning-based hybrid models, have emerged in recent years. As more effort and evidence are needed to determine if hybrid models are superior to traditional methods, we expect to see a growing trend in the development and application of hybrid models to technology forecasting.
- [35] arXiv:2401.02620 [ pdf , other ]
-
Title: Progress and Prospects in 3D Generative AI: A Technical Overview including 3D humanSubjects: Artificial Intelligence (cs.AI) ; Graphics (cs.GR)
Abstract: While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.
- [36] arXiv:2401.02643 [ pdf , other ]
-
Title: Training and Serving System of Foundation Models: A Comprehensive SurveyAuthors: Jiahang Zhou , Yanyu Chen , Zicong Hong , Wuhui Chen , Yue Yu , Tao Zhang , Hui Wang , Chuanfu Zhang , Zibin ZhengSubjects: Artificial Intelligence (cs.AI)
Abstract: Foundation models (e.g., ChatGPT, DALL-E, PengCheng Mind, PanGu-$\Sigma$) have demonstrated extraordinary performance in key technological areas, such as natural language processing and visual recognition, and have become the mainstream trend of artificial general intelligence. This has led more and more major technology giants to dedicate significant human and financial resources to actively develop their foundation model systems, which drives continuous growth of these models' parameters. As a result, the training and serving of these models have posed significant challenges, including substantial computing power, memory consumption, bandwidth demands, etc. Therefore, employing efficient training and serving strategies becomes particularly crucial. Many researchers have actively explored and proposed effective methods. So, a comprehensive survey of them is essential for system developers and researchers. This paper extensively explores the methods employed in training and serving foundation models from various perspectives. It provides a detailed categorization of these state-of-the-art methods, including finer aspects such as network, computing, and storage. Additionally, the paper summarizes the challenges and presents a perspective on the future development direction of foundation model systems. Through comprehensive discussion and analysis, it hopes to provide a solid theoretical basis and practical guidance for future research and applications, promoting continuous innovation and development in foundation model systems.
- [37] arXiv:2401.02703 [ pdf , other ]
-
Title: Verifying Relational Explanations: A Probabilistic ApproachComments: Published in Proceedings of 2023 IEEE Conference on Big DataSubjects: Artificial Intelligence (cs.AI)
Abstract: Explanations on relational data are hard to verify since the explanation structures are more complex (e.g. graphs). To verify interpretable explanations (e.g. explanations of predictions made in images, text, etc.), typically human subjects are used since it does not necessarily require a lot of expertise. However, to verify the quality of a relational explanation requires expertise and is hard to scale-up. GNNExplainer is arguably one of the most popular explanation methods for Graph Neural Networks. In this paper, we develop an approach where we assess the uncertainty in explanations generated by GNNExplainer. Specifically, we ask the explainer to generate explanations for several counterfactual examples. We generate these examples as symmetric approximations of the relational structure in the original data. From these explanations, we learn a factor graph model to quantify uncertainty in an explanation. Our results on several datasets show that our approach can help verify explanations from GNNExplainer by reliably estimating the uncertainty of a relation specified in the explanation.
- [38] arXiv:2401.02705 [ pdf , other ]
-
Title: XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language ModelAuthors: Zhitao Wang , Wei Wang , Zirao Li , Long Wang , Can Yi , Xinjie Xu , Luyang Cao , Hanjing Su , Shouzhi Chen , Jun ZhouSubjects: Artificial Intelligence (cs.AI)
Abstract: In past years, we have been dedicated to automating user acceptance testing (UAT) process of WeChat Pay, one of the most influential mobile payment applications in China. A system titled XUAT has been developed for this purpose. However, there is still a human-labor-intensive stage, i.e, test scripts generation, in the current system. Therefore, in this paper, we concentrate on methods of boosting the automation level of the current system, particularly the stage of test scripts generation. With recent notable successes, large language models (LLMs) demonstrate significant potential in attaining human-like intelligence and there has been a growing research area that employs LLMs as autonomous agents to obtain human-like decision-making capabilities. Inspired by these works, we propose an LLM-powered multi-agent collaborative system, named XUAT-Copilot, for automated UAT. The proposed system mainly consists of three LLM-based agents responsible for action planning, state checking and parameter selecting, respectively, and two additional modules for state sensing and case rewriting. The agents interact with testing device, make human-like decision and generate action command in a collaborative way. The proposed multi-agent system achieves a close effectiveness to human testers in our experimental studies and gains a significant improvement of Pass@1 accuracy compared with single-agent architecture. More importantly, the proposed system has launched in the formal testing environment of WeChat Pay mobile app, which saves a considerable amount of manpower in the daily development work.
- [39] arXiv:2401.02726 [ pdf , ps , other ]
-
Title: Une ontologie pour les syst{è}mes multi-agents ambiants dans les villes intelligentesComments: in French languageJournal-ref: TOTh 2020, Christophe Roche, Nov 2020, Chamb{\'e}ry (73), France. pp.199-216Subjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Towns and cities are currently equipping themselves with a host of connected devices, with a view to transforming themselves into ''smart cities''. To manage this mass of connected objects, autonomous software entities, known as agents, can be attached to them to cooperate and use these devices to offer personalized services. However, this object infrastructure needs to be semantically structured in order to be exploited. This is why the proposal of this article is an ontology, formatted in OWL, describing the object infrastructures, their links with the organization of the multi-agent system and the services to be delivered according to the users of the system. The ontology is applied to smart mobility for people with reduced mobility, and could be adapted to other smart city axes.
- [40] arXiv:2401.02731 [ pdf , other ]
-
Title: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General TasksSubjects: Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across a wide range of tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce a novel approach, Parameter-Efficient Sparsity Crafting (PESC), which transitions dense models to sparse models using a Mixture of Experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal increase in parameters via the inserted adapters. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our sparse models, dubbed Camelidae outperform all other opensource sparse models and exhibit superior general capabilities compared to GPT3.5.
- [41] arXiv:2401.02744 [ pdf , ps , other ]
-
Title: MAMI: Multi-Attentional Mutual-Information for Long Sequence Neuron CaptioningAuthors: Alfirsa Damasyifa Fauzulhaq , Wahyu Parwitayasa , Joseph Ananda Sugihdharma , M. Fadli Ridhani , Novanto YudistiraSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Neuron labeling is an approach to visualize the behaviour and respond of a certain neuron to a certain pattern that activates the neuron. Neuron labeling extract information about the features captured by certain neurons in a deep neural network, one of which uses the encoder-decoder image captioning approach. The encoder used can be a pretrained CNN-based model and the decoder is an RNN-based model for text generation. Previous work, namely MILAN (Mutual Information-guided Linguistic Annotation of Neuron), has tried to visualize the neuron behaviour using modified Show, Attend, and Tell (SAT) model in the encoder, and LSTM added with Bahdanau attention in the decoder. MILAN can show great result on short sequence neuron captioning, but it does not show great result on long sequence neuron captioning, so in this work, we would like to improve the performance of MILAN even more by utilizing different kind of attention mechanism and additionally adding several attention result into one, in order to combine all the advantages from several attention mechanism. Using our compound dataset, we obtained higher BLEU and F1-Score on our proposed model, achieving 17.742 and 0.4811 respectively. At some point where the model converges at the peak, our model obtained BLEU of 21.2262 and BERTScore F1-Score of 0.4870.
- [42] arXiv:2401.02749 [ pdf , other ]
-
Title: Hyperparameter-Free Approach for Faster Minimum Bayes Risk DecodingSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Minimum Bayes-Risk (MBR) decoding is shown to be a powerful alternative to beam search decoding for a wide range of text generation tasks. However, MBR requires a huge amount of time for inference to compute the MBR objective, which makes the method infeasible in many situations where response time is critical. Confidence-based pruning (CBP) (Cheng and Vlachos, 2023) has recently been proposed to reduce the inference time in machine translation tasks. Although it is shown to significantly reduce the amount of computation, it requires hyperparameter tuning using a development set to be effective. To this end, we propose Approximate Minimum Bayes-Risk (AMBR) decoding, a hyperparameter-free method to run MBR decoding approximately. AMBR is derived from the observation that the problem of computing the sample-based MBR objective is the medoid identification problem. AMBR uses the Correlated Sequential Halving (CSH) algorithm (Baharav and Tse, 2019), the best approximation algorithm to date for the medoid identification problem, to compute the sample-based MBR objective. We evaluate AMBR on machine translation, text summarization, and image captioning tasks. The results show that AMBR achieves on par with CBP, with CBP selecting hyperparameters through an Oracle for each given computation budget.
- [43] arXiv:2401.02851 [ pdf , ps , other ]
-
Title: Generative Large Language Models are autonomous practitioners of evidence-based medicineAuthors: Akhil Vaid , Joshua Lampert , Juhee Lee , Ashwin Sawant , Donald Apakama , Ankit Sakhuja , Ali Soroush , Denise Lee , Isotta Landi , Nicole Bussola , Ismail Nabeel , Robbie Freeman , Patricia Kovatch , Brendan Carr , Benjamin Glicksberg , Edgar Argulian , Stamatios Lerakis , Monica Kraft , Alexander Charney , Girish NadkarniComments: Word count: 4548 words, Figures: 4, Tables: 4Subjects: Artificial Intelligence (cs.AI)
Abstract: Background: Evidence-based medicine (EBM) is fundamental to modern clinical practice, requiring clinicians to continually update their knowledge and apply the best clinical evidence in patient care. The practice of EBM faces challenges due to rapid advancements in medical research, leading to information overload for clinicians. The integration of artificial intelligence (AI), specifically Generative Large Language Models (LLMs), offers a promising solution towards managing this complexity.
Methods: This study involved the curation of real-world clinical cases across various specialties, converting them into .json files for analysis. LLMs, including proprietary models like ChatGPT 3.5 and 4, Gemini Pro, and open-source models like LLaMA v2 and Mixtral-8x7B, were employed. These models were equipped with tools to retrieve information from case files and make clinical decisions similar to how clinicians must operate in the real world. Model performance was evaluated based on correctness of final answer, judicious use of tools, conformity to guidelines, and resistance to hallucinations.
Results: GPT-4 was most capable of autonomous operation in a clinical setting, being generally more effective in ordering relevant investigations and conforming to clinical guidelines. Limitations were observed in terms of model ability to handle complex guidelines and diagnostic nuances. Retrieval Augmented Generation made recommendations more tailored to patients and healthcare systems.
Conclusions: LLMs can be made to function as autonomous practitioners of evidence-based medicine. Their ability to utilize tooling can be harnessed to interact with the infrastructure of a real-world healthcare system and perform the tasks of patient management in a guideline directed manner. Prompt engineering may help to further enhance this potential and transform healthcare for the clinician and the patient. - [44] arXiv:2401.02863 [ pdf , other ]
-
Title: A Customizable Generator for Comic-Style Visual NarrativeSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: We present a theory-inspired visual narrative generator that incorporates comic-authoring idioms, which transfers the conceptual principles of comics into system layers that integrate the theories to create comic content. The generator creates comics through sequential decision-making across layers from panel composition, object positions, panel transitions, and narrative elements. Each layer's decisions are based on narrative goals and follow the respective layer idioms of the medium. Cohn's narrative grammar provides the overall story arc. Photographic compositions inspired by the rule of thirds is used to provide panel compositions. McCloud's proposed panel transitions based on focus shifts between scene, character, and temporal changes are encoded in the transition layer. Finally, common overlay symbols (such as the exclamation) are added based on analyzing action verbs using an action-verb ontology. We demonstrate the variety of generated comics through various settings with example outputs. The generator and associated modules could be a useful system for visual narrative authoring and for further research into computational models of visual narrative understanding.
- [45] arXiv:2401.03082 [ pdf , other ]
-
Title: UMIE: Unified Multimodal Information Extraction with Instruction TuningSubjects: Artificial Intelligence (cs.AI)
Abstract: Multimodal information extraction (MIE) gains significant attention as the popularity of multimedia content increases. However, current MIE methods often resort to using task-specific model structures, which results in limited generalizability across tasks and underutilizes shared knowledge across MIE tasks. To address these issues, we propose UMIE, a unified multimodal information extractor to unify three MIE tasks as a generation problem using instruction tuning, being able to effectively extract both textual and visual mentions. Extensive experiments show that our single UMIE outperforms various state-of-the-art (SoTA) methods across six MIE datasets on three tasks. Furthermore, in-depth analysis demonstrates UMIE's strong generalization in the zero-shot setting, robustness to instruction variants, and interpretability. Our research serves as an initial step towards a unified MIE model and initiates the exploration into both instruction tuning and large language models within the MIE domain. Our code, data, and model are available at this https URL
- [46] arXiv:2401.03093 [ pdf , ps , other ]
-
Title: XXAI: Explicitly Explainable AI provides transparency in automatic decision-making by overcoming the limitations of symbolic AIComments: 18 pages, 1 graphical abstract, 1 figure, 72 referencesSubjects: Artificial Intelligence (cs.AI) ; Populations and Evolution (q-bio.PE)
Abstract: There are concerns about the reliability and safety of sub-symbolic neural network AI because its decisions cannot be explained explicitly. This is the black box problem of modern AI. At the same time, symbolic AI has the nature of a white box and is able to ensure the reliability and safety of its decisions. However, several problems prevent the widespread use of symbolic AI: the opacity of mathematical models and natural language terms, the lack of a unified ontology, and the combinatorial explosion of search capabilities. To solve the black-box problem of AI, we propose Explicitly Explainable AI (XXAI) - a fully transparent white-box AI based on deterministic logical cellular automata whose rules are derived from the first principles of the general theory of the relevant domain. In this case, the general theory of the domain plays the role of a knowledge base for deriving the inferences of the cellular automata. A cellular automaton implements parallel multi-level logical inference at all levels of organization - from local interactions of the element base to the system as a whole. Our verification of several ecological hypotheses sets a precedent for the successful implementation of the proposed solution. XXAI can automatically verify the reliability, safety, and ethicality of sub-symbolic neural network AI decisions during both the final and training phases. This paper presents the theoretical and methodological foundations for creating XXAI and discusses the prospects for this direction.
- [47] arXiv:2401.03128 [ pdf , other ]
-
Title: Manifold-based Shapley for SAR Recognization Network ExplanationComments: 5 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Explainable artificial intelligence (XAI) holds immense significance in enhancing the deep neural network's transparency and credibility, particularly in some risky and high-cost scenarios, like synthetic aperture radar (SAR). Shapley is a game-based explanation technique with robust mathematical foundations. However, Shapley assumes that model's features are independent, rendering Shapley explanation invalid for high dimensional models. This study introduces a manifold-based Shapley method by projecting high-dimensional features into low-dimensional manifold features and subsequently obtaining Fusion-Shap, which aims at (1) addressing the issue of erroneous explanations encountered by traditional Shap; (2) resolving the challenge of interpretability that traditional Shap faces in complex scenarios.
- [48] arXiv:2401.03188 [ pdf , other ]
-
Title: A Survey on Verification and Validation, Testing and Evaluations of Neurosymbolic Artificial IntelligenceComments: 16 pages, 8 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Neurosymbolic artificial intelligence (AI) is an emerging branch of AI that combines the strengths of symbolic AI and sub-symbolic AI. A major drawback of sub-symbolic AI is that it acts as a "black box", meaning that predictions are difficult to explain, making the testing & evaluation (T&E) and validation & verification (V&V) processes of a system that uses sub-symbolic AI a challenge. Since neurosymbolic AI combines the advantages of both symbolic and sub-symbolic AI, this survey explores how neurosymbolic applications can ease the V&V process. This survey considers two taxonomies of neurosymbolic AI, evaluates them, and analyzes which algorithms are commonly used as the symbolic and sub-symbolic components in current applications. Additionally, an overview of current techniques for the T&E and V&V processes of these components is provided. Furthermore, it is investigated how the symbolic part is used for T&E and V&V purposes in current neurosymbolic applications. Our research shows that neurosymbolic AI as great potential to ease the T&E and V&V processes of sub-symbolic AI by leveraging the possibilities of symbolic AI. Additionally, the applicability of current T&E and V&V methods to neurosymbolic AI is assessed, and how different neurosymbolic architectures can impact these methods is explored. It is found that current T&E and V&V techniques are partly sufficient to test, evaluate, verify, or validate the symbolic and sub-symbolic part of neurosymbolic applications independently, while some of them use approaches where current T&E and V&V methods are not applicable by default, and adjustments or even new approaches are needed. Our research shows that there is great potential in using symbolic AI to test, evaluate, verify, or validate the predictions of a sub-symbolic model, making neurosymbolic AI an interesting research direction for safe, secure, and trustworthy AI.
- [49] arXiv:2401.03194 [ pdf , other ]
-
Title: Learning Persistent Community Structures in Dynamic Networks via Topological Data AnalysisComments: AAAI 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Dynamic community detection methods often lack effective mechanisms to ensure temporal consistency, hindering the analysis of network evolution. In this paper, we propose a novel deep graph clustering framework with temporal consistency regularization on inter-community structures, inspired by the concept of minimal network topological changes within short intervals. Specifically, to address the representation collapse problem, we first introduce MFC, a matrix factorization-based deep graph clustering algorithm that preserves node embedding. Based on static clustering results, we construct probabilistic community networks and compute their persistence homology, a robust topological measure, to assess structural similarity between them. Moreover, a novel neural network regularization TopoReg is introduced to ensure the preservation of topological similarity between inter-community structures over time intervals. Our approach enhances temporal consistency and clustering accuracy on real-world datasets with both fixed and varying numbers of communities. It is also a pioneer application of TDA in temporally persistent community detection, offering an insightful contribution to field of network analysis. Code and data are available at the public git repository: this https URL
- [50] arXiv:2401.03197 [ pdf , other ]
-
Title: Decision Making in Non-Stationary Environments with Policy-Augmented SearchAuthors: Ava Pettet , Yunuo Zhang , Baiting Luo , Kyle Wray , Hendrik Baier , Aron Laszka , Abhishek Dubey , Ayan MukhopadhyayComments: Extended Abstract accepted for presentation at AAMAS 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.
- [51] arXiv:2401.03408 [ pdf , other ]
-
Title: Escalation Risks from Language Models in Military and Diplomatic Decision-MakingAuthors: Juan-Pablo Rivera , Gabriel Mukobi , Anka Reuel , Max Lamparth , Chandler Smith , Jacquelyn SchneiderComments: 10 pages body, 57 pages appendix, 46 figures, 11 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computers and Society (cs.CY); Multiagent Systems (cs.MA)
Abstract: Governments are increasingly considering integrating autonomous AI agents in high-stakes military and foreign-policy decision-making, especially with the emergence of advanced generative AI models like GPT-4. Our work aims to scrutinize the behavior of multiple AI agents in simulated wargames, specifically focusing on their predilection to take escalatory actions that may exacerbate multilateral conflicts. Drawing on political science and international relations literature about escalation dynamics, we design a novel wargame simulation and scoring framework to assess the escalation risks of actions taken by these agents in different scenarios. Contrary to prior studies, our research provides both qualitative and quantitative insights and focuses on large language models (LLMs). We find that all five studied off-the-shelf LLMs show forms of escalation and difficult-to-predict escalation patterns. We observe that models tend to develop arms-race dynamics, leading to greater conflict, and in rare cases, even to the deployment of nuclear weapons. Qualitatively, we also collect the models' reported reasonings for chosen actions and observe worrying justifications based on deterrence and first-strike tactics. Given the high stakes of military and foreign-policy contexts, we recommend further examination and cautious consideration before deploying autonomous language model agents for strategic military or diplomatic decision-making.
- [52] arXiv:2401.03410 [ pdf , other ]
-
Title: Engineering Features to Improve Pass Prediction in Soccer Simulation 2D GamesSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Soccer Simulation 2D (SS2D) is a simulation of a real soccer game in two dimensions. In soccer, passing behavior is an essential action for keeping the ball in possession of our team and creating goal opportunities. Similarly, for SS2D, predicting the passing behaviors of both opponents and our teammates helps manage resources and score more goals. Therefore, in this research, we have tried to address the modeling of passing behavior of soccer 2D players using Deep Neural Networks (DNN) and Random Forest (RF). We propose an embedded data extraction module that can record the decision-making of agents in an online format. Afterward, we apply four data sorting techniques for training data preparation. After, we evaluate the trained models' performance playing against 6 top teams of RoboCup 2019 that have distinctive playing strategies. Finally, we examine the importance of different feature groups on the prediction of a passing strategy. All results in each step of this work prove our suggested methodology's effectiveness and improve the performance of the pass prediction in Soccer Simulation 2D games ranging from 5\% (e.g., playing against the same team) to 10\% (e.g., playing against Robocup top teams).
- [53] arXiv:2401.03428 [ pdf , other ]
-
Title: Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and ProspectsAuthors: Yuheng Cheng , Ceyao Zhang , Zhengwen Zhang , Xiangrui Meng , Sirui Hong , Wenhao Li , Zihao Wang , Zekai Wang , Feng Yin , Junhua Zhao , Xiuqiang HeSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Intelligent agents stand out as a potential path toward artificial general intelligence (AGI). Thus, researchers have dedicated significant effort to diverse implementations for them. Benefiting from recent progress in large language models (LLMs), LLM-based agents that use universal natural language as an interface exhibit robust generalization capabilities across various applications -- from serving as autonomous general-purpose task assistants to applications in coding, social, and economic domains, LLM-based agents offer extensive exploration opportunities. This paper surveys current research to provide an in-depth overview of LLM-based intelligent agents within single-agent and multi-agent systems. It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback. We also delve into the mechanisms of deploying LLM-based agents in multi-agent systems, including multi-role collaboration, message passing, and strategies to alleviate communication issues between agents. The discussions also shed light on popular datasets and application scenarios. We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.
- [54] arXiv:2401.03454 [ pdf , other ]
-
Title: Computational Argumentation-based Chatbots: a SurveySubjects: Artificial Intelligence (cs.AI)
Abstract: Chatbots are conversational software applications designed to interact dialectically with users for a plethora of different purposes. Surprisingly, these colloquial agents have only recently been coupled with computational models of arguments (i.e. computational argumentation), whose aim is to formalise, in a machine-readable format, the ordinary exchange of information that characterises human communications. Chatbots may employ argumentation with different degrees and in a variety of manners. The present survey sifts through the literature to review papers concerning this kind of argumentation-based bot, drawing conclusions about the benefits and drawbacks that this approach entails in comparison with standard chatbots, while also envisaging possible future development and integration with the Transformer-based architecture and state-of-the-art Large Language models.
- [55] arXiv:2401.03504 [ pdf , other ]
-
Title: ClusterComm: Discrete Communication in Decentralized MARL using Internal Representation ClusteringAuthors: Robert Müller , Hasan Turalic , Thomy Phan , Michael Kölle , Jonas Nüßlein , Claudia Linnhoff-PopienComments: Accepted at ICAART 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: In the realm of Multi-Agent Reinforcement Learning (MARL), prevailing approaches exhibit shortcomings in aligning with human learning, robustness, and scalability. Addressing this, we introduce ClusterComm, a fully decentralized MARL framework where agents communicate discretely without a central control unit. ClusterComm utilizes Mini-Batch-K-Means clustering on the last hidden layer's activations of an agent's policy network, translating them into discrete messages. This approach outperforms no communication and competes favorably with unbounded, continuous communication and hence poses a simple yet effective strategy for enhancing collaborative task-solving in MARL.
- [56] arXiv:2401.03529 [ pdf , other ]
-
Title: Quantifying stability of non-power-seeking in artificial agentsAuthors: Evan Ryan Gunter (1), Yevgeny Liokumovich (2), Victoria Krakovna (3) ((1) ML Alignment & Theory Scholars (MATS), (2) University of Toronto, (3) Google DeepMind)Comments: 37 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: We investigate the question: if an AI agent is known to be safe in one setting, is it also safe in a new setting similar to the first? This is a core question of AI alignment--we train and test models in a certain environment, but deploy them in another, and we need to guarantee that models that seem safe in testing remain so in deployment. Our notion of safety is based on power-seeking--an agent which seeks power is not safe. In particular, we focus on a crucial type of power-seeking: resisting shutdown. We model agents as policies for Markov decision processes, and show (in two cases of interest) that not resisting shutdown is "stable": if an MDP has certain policies which don't avoid shutdown, the corresponding policies for a similar MDP also don't avoid shutdown. We also show that there are natural cases where safety is _not_ stable--arbitrarily small perturbations may result in policies which never shut down. In our first case of interest--near-optimal policies--we use a bisimulation metric on MDPs to prove that small perturbations won't make the agent take longer to shut down. Our second case of interest is policies for MDPs satisfying certain constraints which hold for various models (including language models). Here, we demonstrate a quantitative bound on how fast the probability of not shutting down can increase: by defining a metric on MDPs; proving that the probability of not shutting down, as a function on MDPs, is lower semicontinuous; and bounding how quickly this function decreases.
- [57] arXiv:2401.03546 [ pdf , other ]
-
Title: NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open WorldsComments: Accepted at AAMAS-2024Subjects: Artificial Intelligence (cs.AI)
Abstract: As AI agents leave the lab and venture into the real world as autonomous vehicles, delivery robots, and cooking robots, it is increasingly necessary to design and comprehensively evaluate algorithms that tackle the ``open-world''. To this end, we introduce NovelGym, a flexible and adaptable ecosystem designed to simulate gridworld environments, serving as a robust platform for benchmarking reinforcement learning (RL) and hybrid planning and learning agents in open-world contexts. The modular architecture of NovelGym facilitates rapid creation and modification of task environments, including multi-agent scenarios, with multiple environment transformations, thus providing a dynamic testbed for researchers to develop open-world AI agents.
- [58] arXiv:2401.03568 [ pdf , other ]
-
Title: Agent AI: Surveying the Horizons of Multimodal InteractionAuthors: Zane Durante , Qiuyuan Huang , Naoki Wake , Ran Gong , Jae Sung Park , Bidipta Sarkar , Rohan Taori , Yusuke Noda , Demetri Terzopoulos , Yejin Choi , Katsushi Ikeuchi , Hoi Vo , Li Fei-Fei , Jianfeng GaoSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Multi-modal AI systems will likely become a ubiquitous presence in our everyday lives. A promising approach to making these systems more interactive is to embody them as agents within physical and virtual environments. At present, systems leverage existing foundation models as the basic building blocks for the creation of embodied agents. Embedding agents within such environments facilitates the ability of models to process and interpret visual and contextual data, which is critical for the creation of more sophisticated and context-aware AI systems. For example, a system that can perceive user actions, human behavior, environmental objects, audio expressions, and the collective sentiment of a scene can be used to inform and direct agent responses within the given environment. To accelerate research on agent-based multimodal intelligence, we define "Agent AI" as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. In particular, we explore systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. We argue that by developing agentic AI systems in grounded environments, one can also mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs. The emerging field of Agent AI subsumes the broader embodied and agentic aspects of multimodal interactions. Beyond agents acting and interacting in the physical world, we envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
- [59] arXiv:2401.03880 [ pdf , ps , other ]
-
Title: Metaheuristics for (Variable-Size) Mixed Optimization Problems: A Unified Taxonomy and SurveyAuthors: Prof. El-Ghazali TalbiSubjects: Artificial Intelligence (cs.AI) ; Discrete Mathematics (cs.DM); Neural and Evolutionary Computing (cs.NE)
Abstract: Many real world optimization problems are formulated as mixed-variable optimization problems (MVOPs) which involve both continuous and discrete variables. MVOPs including dimensional variables are characterized by a variable-size search space. Depending on the values of dimensional variables, the number and type of the variables of the problem can vary dynamically. MVOPs and variable-size MVOPs (VMVOPs) are difficult to solve and raise a number of scientific challenges in the design of metaheuristics. Standard metaheuristics have been first designed to address continuous or discrete optimization problems, and are not able to tackle (V)MVOPs in an efficient way. The development of metaheuristics for solving such problems has attracted the attention of many researchers and is increasingly popular. However, to our knowledge there is no well established taxonomy and comprehensive survey for handling this important family of optimization problems.
This paper presents a unified taxonomy for metaheuristic solutions for solving (V)MVOPs in an attempt to provide a common terminology and classification mechanisms. It provides a general mathematical formulation and concepts of (V)MVOPs, and identifies the various solving methodologies than can be applied in metaheuristics. The advantages, the weaknesses and the limitations of the presented methodologies are discussed. The proposed taxonomy also allows to identify some open research issues which needs further in-depth investigations. - [60] arXiv:2401.03991 [ pdf , other ]
-
Title: Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame BenchmarkComments: Camera-Ready version for AAAI 2024Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Databases (cs.DB); Logic in Computer Science (cs.LO)
Abstract: Artificial intelligence (AI) has made remarkable progress across various domains, with large language models like ChatGPT gaining substantial attention for their human-like text-generation capabilities. Despite these achievements, spatial reasoning remains a significant challenge for these models. Benchmarks like StepGame evaluate AI spatial reasoning, where ChatGPT has shown unsatisfactory performance. However, the presence of template errors in the benchmark has an impact on the evaluation results. Thus there is potential for ChatGPT to perform better if these template errors are addressed, leading to more accurate assessments of its spatial reasoning capabilities. In this study, we refine the StepGame benchmark, providing a more accurate dataset for model evaluation. We analyze GPT's spatial reasoning performance on the rectified benchmark, identifying proficiency in mapping natural language text to spatial relations but limitations in multi-hop reasoning. We provide a flawless solution to the benchmark by combining template-to-relation mapping with logic-based reasoning. This combination demonstrates proficiency in performing qualitative reasoning on StepGame without encountering any errors. We then address the limitations of GPT models in spatial reasoning. We deploy Chain-of-thought and Tree-of-thoughts prompting strategies, offering insights into GPT's ``cognitive process", and achieving remarkable improvements in accuracy. Our investigation not only sheds light on model deficiencies but also proposes enhancements, contributing to the advancement of AI with more robust spatial reasoning capabilities.
- [61] arXiv:2401.04374 [ pdf , other ]
-
Title: Towards Explainable Artificial Intelligence (XAI): A Data Mining PerspectiveAuthors: Haoyi Xiong , Xuhong Li , Xiaofei Zhang , Jiamin Chen , Xinhao Sun , Yuchen Li , Zeyi Sun , Mengnan DuSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Given the complexity and lack of transparency in deep neural networks (DNNs), extensive efforts have been made to make these systems more interpretable or explain their behaviors in accessible terms. Unlike most reviews, which focus on algorithmic and model-centric perspectives, this work takes a "data-centric" view, examining how data collection, processing, and analysis contribute to explainable AI (XAI). We categorize existing work into three categories subject to their purposes: interpretations of deep models, referring to feature attributions and reasoning processes that correlate data points with model outputs; influences of training data, examining the impact of training data nuances, such as data valuation and sample anomalies, on decision-making processes; and insights of domain knowledge, discovering latent patterns and fostering new knowledge from data and models to advance social values and scientific discovery. Specifically, we distill XAI methodologies into data mining operations on training and testing data across modalities, such as images, text, and tabular data, as well as on training logs, checkpoints, models and other DNN behavior descriptors. In this way, our study offers a comprehensive, data-centric examination of XAI from a lens of data mining methods and applications.
- [62] arXiv:2401.04429 [ pdf , other ]
-
Title: i-Rebalance: Personalized Vehicle Repositioning for Supply Demand BalanceAuthors: Haoyang Chen , Peiyan Sun , Qiyuan Song , Wanyuan Wang , Weiwei Wu , Wencan Zhang , Guanyu Gao , Yan LyuSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the recommendation or not on their own. We propose i-Rebalance, a personalized vehicle reposition technique with deep reinforcement learning (DRL). i-Rebalance estimates drivers' decisions on accepting reposition recommendations through an on-field user study involving 99 real drivers. To optimize supply-demand balance and enhance preference satisfaction simultaneously, i-Rebalance has a sequential reposition strategy with dual DRL agents: Grid Agent to determine the reposition order of idle vehicles, and Vehicle Agent to provide personalized recommendations to each vehicle in the pre-defined order. This sequential learning strategy facilitates more effective policy training within a smaller action space compared to traditional joint-action methods. Evaluation of real-world trajectory data shows that i-Rebalance improves driver acceptance rate by 38.07% and total driver income by 9.97%.
- [63] arXiv:2401.04631 [ pdf , other ]
-
Title: Deep Reinforcement Multi-agent Learning framework for Information Gathering with Local Gaussian Processes for Water MonitoringAuthors: Samuel Yanes Luis , Dmitriy Shutin , Juan Marchal Gómez , Daniel Gutiérrez Reina , Sergio Toral MarínSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: The conservation of hydrological resources involves continuously monitoring their contamination. A multi-agent system composed of autonomous surface vehicles is proposed in this paper to efficiently monitor the water quality. To achieve a safe control of the fleet, the fleet policy should be able to act based on measurements and to the the fleet state. It is proposed to use Local Gaussian Processes and Deep Reinforcement Learning to jointly obtain effective monitoring policies. Local Gaussian processes, unlike classical global Gaussian processes, can accurately model the information in a dissimilar spatial correlation which captures more accurately the water quality information. A Deep convolutional policy is proposed, that bases the decisions on the observation on the mean and variance of this model, by means of an information gain reward. Using a Double Deep Q-Learning algorithm, agents are trained to minimize the estimation error in a safe manner thanks to a Consensus-based heuristic. Simulation results indicate an improvement of up to 24% in terms of the mean absolute error with the proposed models. Also, training results with 1-3 agents indicate that our proposed approach returns 20% and 24% smaller average estimation errors for, respectively, monitoring water quality variables and monitoring algae blooms, as compared to state-of-the-art approaches
- [64] arXiv:2401.04812 [ pdf , other ]
-
Title: Sample-and-Bound for Non-Convex OptimizationComments: Published at AAAI 2024. Code is available at this https URLSubjects: Artificial Intelligence (cs.AI)
Abstract: Standard approaches for global optimization of non-convex functions, such as branch-and-bound, maintain partition trees to systematically prune the domain. The tree size grows exponentially in the number of dimensions. We propose new sampling-based methods for non-convex optimization that adapts Monte Carlo Tree Search (MCTS) to improve efficiency. Instead of the standard use of visitation count in Upper Confidence Bounds, we utilize numerical overapproximations of the objective as an uncertainty metric, and also take into account of sampled estimates of first-order and second-order information. The Monte Carlo tree in our approach avoids the usual fixed combinatorial patterns in growing the tree, and aggressively zooms into the promising regions, while still balancing exploration and exploitation. We evaluate the proposed algorithms on high-dimensional non-convex optimization benchmarks against competitive baselines and analyze the effects of the hyper parameters.
- [65] arXiv:2401.05133 [ pdf , other ]
-
Title: Neural Population Learning beyond Symmetric Zero-sum GamesSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: We study computationally efficient methods for finding equilibria in n-player general-sum games, specifically ones that afford complex visuomotor skills. We show how existing methods would struggle in this setting, either computationally or in theory. We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game. We show empirical convergence in a suite of OpenSpiel games, validated rigorously by exact game solvers. We then deploy NeuPL-JPSRO to complex domains, where our approach enables adaptive coordination in a MuJoCo control domain and skill transfer in capture-the-flag. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality, paving the way towards solving real-world games between heterogeneous players with mixed motives.
- [66] arXiv:2401.05134 [ pdf , other ]
-
Title: Yes, this is what I was looking for! Towards Multi-modal Medical Consultation Concern Summary GenerationSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Over the past few years, the use of the Internet for healthcare-related tasks has grown by leaps and bounds, posing a challenge in effectively managing and processing information to ensure its efficient utilization. During moments of emotional turmoil and psychological challenges, we frequently turn to the internet as our initial source of support, choosing this over discussing our feelings with others due to the associated social stigma. In this paper, we propose a new task of multi-modal medical concern summary (MMCS) generation, which provides a short and precise summary of patients' major concerns brought up during the consultation. Nonverbal cues, such as patients' gestures and facial expressions, aid in accurately identifying patients' concerns. Doctors also consider patients' personal information, such as age and gender, in order to describe the medical condition appropriately. Motivated by the potential efficacy of patients' personal context and visual gestures, we propose a transformer-based multi-task, multi-modal intent-recognition, and medical concern summary generation (IR-MMCSG) system. Furthermore, we propose a multitasking framework for intent recognition and medical concern summary generation for doctor-patient consultations. We construct the first multi-modal medical concern summary generation (MM-MediConSummation) corpus, which includes patient-doctor consultations annotated with medical concern summaries, intents, patient personal information, doctor's recommendations, and keywords. Our experiments and analysis demonstrate (a) the significant role of patients' expressions/gestures and their personal information in intent identification and medical concern summary generation, and (b) the strong correlation between intent recognition and patients' medical concern summary generation
The dataset and source code are available at this https URL . - [67] arXiv:2401.05654 [ pdf , other ]
-
Title: Towards Conversational Diagnostic AIAuthors: Tao Tu , Anil Palepu , Mike Schaekermann , Khaled Saab , Jan Freyberg , Ryutaro Tanno , Amy Wang , Brenna Li , Mohamed Amin , Nenad Tomasev , Shekoofeh Azizi , Karan Singhal , Yong Cheng , Le Hou , Albert Webson , Kavita Kulkarni , S Sara Mahdavi , Christopher Semturs , Juraj Gottweis , Joelle Barral , Katherine Chou , Greg S Corrado , Yossi Matias , Alan Karthikesalingam , Vivek NatarajanComments: 46 pages, 5 figures in main text, 19 figures in appendixSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI. - [68] arXiv:2401.05743 [ pdf , ps , other ]
-
Title: Consistent Query Answering for Existential Rules with Closed PredicatesComments: 31 pages. arXiv admin note: text overlap with arXiv:2207.09198Subjects: Artificial Intelligence (cs.AI)
Abstract: Consistent Query Answering (CQA) is an inconsistency-tolerant approach to data access in knowledge bases and databases. The goal of CQA is to provide meaningful (consistent) answers to queries even in the presence of inconsistent information, e.g. a database whose data conflict with meta-data (typically the database integrity constraints). The semantics of CQA is based on the notion of repair, that is, a consistent version of the initial, inconsistent database that is obtained through minimal modifications. We study CQA in databases with data dependencies expressed by existential rules. More specifically, we focus on the broad class of disjunctive embedded dependencies with inequalities (DEDs), which extend both tuple-generating dependencies and equality-generated dependencies. We first focus on the case when the database predicates are closed, i.e. the database is assumed to have complete knowledge about such predicates, thus no tuple addition is possible to repair the database. In such a scenario, we provide a detailed analysis of the data complexity of CQA and associated tasks (repair checking) under different semantics (AR and IAR) and for different classes of existential rules. In particular, we consider the classes of acyclic, linear, full, sticky and guarded DEDs, and their combinations.
- [69] arXiv:2401.05822 [ pdf , other ]
-
Title: Towards Goal-Oriented Agents for Evolving Problems Observed via ConversationComments: 15 pages, 7 figuresJournal-ref: Artificial Intelligence XL. SGAI 2023. Lecture Notes in Computer Science, vol 14381. 142-155Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: The objective of this work is to train a chatbot capable of solving evolving problems through conversing with a user about a problem the chatbot cannot directly observe. The system consists of a virtual problem (in this case a simple game), a simulated user capable of answering natural language questions that can observe and perform actions on the problem, and a Deep Q-Network (DQN)-based chatbot architecture. The chatbot is trained with the goal of solving the problem through dialogue with the simulated user using reinforcement learning. The contributions of this paper are as follows: a proposed architecture to apply a conversational DQN-based agent to evolving problems, an exploration of training methods such as curriculum learning on model performance and the effect of modified reward functions in the case of increasing environment complexity.
- [70] arXiv:2401.05960 [ pdf , other ]
-
Title: Machine Learning Insides OptVerse AI Solver: Design Principles and ApplicationsAuthors: Xijun Li , Fangzhou Zhu , Hui-Ling Zhen , Weilin Luo , Meng Lu , Yimin Huang , Zhenan Fan , Zirui Zhou , Yufei Kuang , Zhihai Wang , Zijie Geng , Yang Li , Haoyang Liu , Zhiwu An , Muming Yang , Jianshu Li , Jie Wang , Junchi Yan , Defeng Sun , Tao Zhong , Yong Zhang , Jia Zeng , Mingxuan Yuan , Jianye Hao , Jun Yao , Kun MaoSubjects: Artificial Intelligence (cs.AI)
Abstract: In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional optimization techniques. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. Furthermore, we introduce a training framework leveraging augmentation policies to maintain solvers' utility in dynamic environments. Besides the data generation and augmentation, our proposed approaches also include novel ML-driven policies for personalized solver strategies, with an emphasis on applications like graph convolutional networks for initial basis selection and reinforcement learning for advanced presolving and cut selection. Additionally, we detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance. Compared with traditional solvers such as Cplex and SCIP, our ML-augmented OptVerse AI Solver demonstrates superior speed and precision across both established benchmarks and real-world scenarios, reinforcing the practical imperative and effectiveness of machine learning techniques in mathematical programming solvers.
- [71] arXiv:2401.06072 [ pdf , other ]
-
Title: Chain of History: Learning and Forecasting with LLMs for Temporal Knowledge Graph CompletionComments: 15 pages; typos corrected, references addedSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Temporal Knowledge Graph Completion (TKGC) is a complex task involving the prediction of missing event links at future timestamps by leveraging established temporal structural knowledge. This paper aims to provide a comprehensive perspective on harnessing the advantages of Large Language Models (LLMs) for reasoning in temporal knowledge graphs, presenting an easily transferable pipeline. In terms of graph modality, we underscore the LLMs' prowess in discerning the structural information of pivotal nodes within the historical chain. As for the generation mode of the LLMs utilized for inference, we conduct an exhaustive exploration into the variances induced by a range of inherent factors in LLMs, with particular attention to the challenges in comprehending reverse logic. We adopt a parameter-efficient fine-tuning strategy to harmonize the LLMs with the task requirements, facilitating the learning of the key knowledge highlighted earlier. Comprehensive experiments are undertaken on several widely recognized datasets, revealing that our framework exceeds or parallels existing methods across numerous popular metrics. Additionally, we execute a substantial range of ablation experiments and draw comparisons with several advanced commercial LLMs, to investigate the crucial factors influencing LLMs' performance in structured temporal knowledge inference tasks.
- [72] arXiv:2401.06080 [ pdf , other ]
-
Title: Secrets of RLHF in Large Language Models Part II: Reward ModelingAuthors: Binghai Wang , Rui Zheng , Lu Chen , Yan Liu , Shihan Dou , Caishuang Huang , Wei Shen , Senjie Jin , Enyu Zhou , Chenyu Shi , Songyang Gao , Nuo Xu , Yuhao Zhou , Xiaoran Fan , Zhiheng Xi , Jun Zhao , Xiao Wang , Tao Ji , Hang Yan , Lixing Shen , Zhan Chen , Tao Gui , Qi Zhang , Xipeng Qiu , Xuanjing Huang , Zuxuan Wu , Yu-Gang JiangSubjects: Artificial Intelligence (cs.AI)
Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for human preferences to drive reinforcement learning optimization. While reward models are often considered central to achieving high performance, they face the following challenges in practical applications: (1) Incorrect and ambiguous preference pairs in the dataset may hinder the reward model from accurately capturing human intent. (2) Reward models trained on data from a specific distribution often struggle to generalize to examples outside that distribution and are not suitable for iterative RLHF training.
In this report, we attempt to address these two issues. (1) From a data perspective, we propose a method to measure the strength of preferences within the data, based on a voting mechanism of multiple reward models. Experimental results confirm that data with varying preference strengths have different impacts on reward model performance. We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset and fully leverage high-quality preference data. (2) From an algorithmic standpoint, we introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses, thereby improving model generalization. Furthermore, we employ meta-learning to enable the reward model to maintain the ability to differentiate subtle differences in out-of-distribution samples, and this approach can be utilized for iterative RLHF optimization. - [73] arXiv:2401.06256 [ pdf , ps , other ]
-
Title: A Universal Knowledge Model and Cognitive Architecture for Prototyping AGISubjects: Artificial Intelligence (cs.AI)
Abstract: The article identified 42 cognitive architectures for creating general artificial intelligence (AGI) and proposed a set of interrelated functional blocks that an agent approaching AGI in its capabilities should possess. Since the required set of blocks is not found in any of the existing architectures, the article proposes a new cognitive architecture for intelligent systems approaching AGI in their capabilities. As one of the key solutions within the framework of the architecture, a universal method of knowledge representation is proposed, which allows combining various non-formalized, partially and fully formalized methods of knowledge representation in a single knowledge base, such as texts in natural languages, images, audio and video recordings, graphs, algorithms, databases, neural networks, knowledge graphs, ontologies, frames, essence-property-relation models, production systems, predicate calculus models, conceptual models, and others. To combine and structure various fragments of knowledge, archigraph models are used, constructed as a development of annotated metagraphs. As components, the cognitive architecture being developed includes machine consciousness, machine subconsciousness, blocks of interaction with the external environment, a goal management block, an emotional control system, a block of social interaction, a block of reflection, an ethics block and a worldview block, a learning block, a monitoring block, blocks of statement and solving problems, self-organization and meta learning block.
- [74] arXiv:2401.06293 [ pdf , other ]
-
Title: MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation SystemsAuthors: Qiang Charles Xiao , Ajith Muralidharan , Birjodh Tiwana , Johnson Jia , Fedor Borisyuk , Aman Gupta , Dawn WoodardComments: 10 pagesSubjects: Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR)
Abstract: In this paper, we propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient enough (linear time complexity) for large-scale production recommendation engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the receiver operating characteristic Curve (AUC) which is mainly due to explicitly modeling mutual influences among items of a list, and leveraging the second pass ranking scores of multiple objectives. In addition, we have generalized the offline replay theory to multi-slot re-ranking scenarios, with trade-offs among multiple objectives. The offline replay results can be further improved by Pareto Optimality. Moreover, we've built a multi-slot re-ranking simulator based on OpenAI Gym integrated with the Ray framework. It can be easily configured for different assumptions to quickly benchmark both reinforcement learning and supervised learning algorithms.
- [75] arXiv:2401.06375 [ pdf , ps , other ]
-
Title: Cognitive BPM as an Equalizer: Improving Access and Efficiency for Employees with (and without) Cognitive DisabilitiesComments: 7 pages, 2 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: We examine ProcessGPT, an AI model designed to automate, augment, and improve business processes, to study the challenges of managing business processes within the cognitive limitations of the human workforce, particularly individuals with cognitive disabilities. ProcessGPT provides a blueprint for designing efficient business processes that take into account human cognitive limitations. By viewing this through the lens of cognitive disabilities, we show that ProcessGPT improves process usability for individuals with and without cognitive disabilities. We also demonstrate that organizations implementing ProcessGPT-like capabilities will realize increased productivity, morale, and inclusion.
- [76] arXiv:2401.06379 [ pdf , other ]
-
Title: Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic ProgramsAuthors: Matthew L. Daggitt , Wen Kokke , Robert Atkey , Natalia Slusarz , Luca Arnaboldi , Ekaterina KomendantskayaSubjects: Artificial Intelligence (cs.AI)
Abstract: Neuro-symbolic programs -- programs containing both machine learning components and traditional symbolic code -- are becoming increasingly widespread. However, we believe that there is still a lack of a general methodology for verifying these programs whose correctness depends on the behaviour of the machine learning components. In this paper, we identify the ``embedding gap'' -- the lack of techniques for linking semantically-meaningful ``problem-space'' properties to equivalent ``embedding-space'' properties -- as one of the key issues, and describe Vehicle, a tool designed to facilitate the end-to-end verification of neural-symbolic programs in a modular fashion. Vehicle provides a convenient language for specifying ``problem-space'' properties of neural networks and declaring their relationship to the ``embedding-space", and a powerful compiler that automates interpretation of these properties in the language of a chosen machine-learning training environment, neural network verifier, and interactive theorem prover. We demonstrate Vehicle's utility by using it to formally verify the safety of a simple autonomous car equipped with a neural network controller.
- [77] arXiv:2401.06465 [ pdf , other ]
-
Title: Sanity Checks Revisited: An Exploration to Repair the Model Parameter Randomisation TestComments: 19 pages, 12 figures, NeurIPS XAIA 2023Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Methodology (stat.ME)
Abstract: The Model Parameter Randomisation Test (MPRT) is widely acknowledged in the eXplainable Artificial Intelligence (XAI) community for its well-motivated evaluative principle: that the explanation function should be sensitive to changes in the parameters of the model function. However, recent works have identified several methodological caveats for the empirical interpretation of MPRT. To address these caveats, we introduce two adaptations to the original MPRT -- Smooth MPRT and Efficient MPRT, where the former minimises the impact that noise has on the evaluation results through sampling and the latter circumvents the need for biased similarity measurements by re-interpreting the test through the explanation's rise in complexity, after full parameter randomisation. Our experimental results demonstrate that these proposed variants lead to improved metric reliability, thus enabling a more trustworthy application of XAI methods.
- [78] arXiv:2401.06471 [ pdf , other ]
-
Title: A Brain-inspired Computational Model for Human-like Concept LearningSubjects: Artificial Intelligence (cs.AI)
Abstract: Concept learning is a fundamental aspect of human cognition and plays a critical role in mental processes such as categorization, reasoning, memory, and decision-making. Researchers across various disciplines have shown consistent interest in the process of concept acquisition in individuals. To elucidate the mechanisms involved in human concept learning, this study examines the findings from computational neuroscience and cognitive psychology. These findings indicate that the brain's representation of concepts relies on two essential components: multisensory representation and text-derived representation. These two types of representations are coordinated by a semantic control system, ultimately leading to the acquisition of concepts. Drawing inspiration from this mechanism, the study develops a human-like computational model for concept learning based on spiking neural networks. By effectively addressing the challenges posed by diverse sources and imbalanced dimensionality of the two forms of concept representations, the study successfully attains human-like concept representations. Tests involving similar concepts demonstrate that our model, which mimics the way humans learn concepts, yields representations that closely align with human cognition.
- [79] arXiv:2401.06781 [ pdf , other ]
-
Title: PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language ModelSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Poker, also known as Texas Hold'em, has always been a typical research target within imperfect information games (IIGs). IIGs have long served as a measure of artificial intelligence (AI) development. Representative prior works, such as DeepStack and Libratus heavily rely on counterfactual regret minimization (CFR) to tackle heads-up no-limit Poker. However, it is challenging for subsequent researchers to learn CFR from previous models and apply it to other real-world applications due to the expensive computational cost of CFR iterations. Additionally, CFR is difficult to apply to multi-player games due to the exponential growth of the game tree size. In this work, we introduce PokerGPT, an end-to-end solver for playing Texas Hold'em with arbitrary number of players and gaining high win rates, established on a lightweight large language model (LLM). PokerGPT only requires simple textual information of Poker games for generating decision-making advice, thus guaranteeing the convenient interaction between AI and humans. We mainly transform a set of textual records acquired from real games into prompts, and use them to fine-tune a lightweight pre-trained LLM using reinforcement learning human feedback technique. To improve fine-tuning performance, we conduct prompt engineering on raw data, including filtering useful information, selecting behaviors of players with high win rates, and further processing them into textual instruction using multiple prompt engineering techniques. Through the experiments, we demonstrate that PokerGPT outperforms previous approaches in terms of win rate, model size, training time, and response speed, indicating the great potential of LLMs in solving IIGs.
- [80] arXiv:2401.06793 [ pdf , ps , other ]
-
Title: Greedy Algorithm for Inference of Decision Trees from Decision Rule SystemsSubjects: Artificial Intelligence (cs.AI)
Abstract: Decision trees and decision rule systems play important roles as classifiers, knowledge representation tools, and algorithms. They are easily interpretable models for data analysis, making them widely used and studied in computer science. Understanding the relationships between these two models is an important task in this field. There are well-known methods for converting decision trees into systems of decision rules. In this paper, we consider the inverse transformation problem, which is not so simple. Instead of constructing an entire decision tree, our study focuses on a greedy polynomial time algorithm that simulates the operation of a decision tree on a given tuple of attribute values.
- [81] arXiv:2401.06801 [ pdf , ps , other ]
-
Title: Graph-of-Thought: Utilizing Large Language Models to Solve Complex and Dynamic Business ProblemsAuthors: Ye LiComments: Keywords: Graph-of-Thought (GoT), Workflow Automation, Large Language Models (LLMs), Task Execution, Data-Driven Decision Making, Complexity ManagementSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper presents Graph-of-Thought (GoT), a new model for workflow automation that enhances the flexibility and efficiency of Large Language Models (LLMs) in complex task execution. GoT advances beyond traditional linear and tree-like cognitive models with a graph structure that enables dynamic path selection. The open-source engine GoTFlow demonstrates the practical application of GoT, facilitating automated, data-driven decision-making across various domains. Despite challenges in complexity and transparency, GoTFlow's potential for improving business processes is significant, promising advancements in both efficiency and decision quality with continuous development.
- [82] arXiv:2401.06810 [ pdf , other ]
-
Title: TONE: A 3-Tiered ONtology for Emotion analysisSubjects: Artificial Intelligence (cs.AI)
Abstract: Emotions have played an important part in many sectors, including psychology, medicine, mental health, computer science, and so on, and categorizing them has proven extremely useful in separating one emotion from another. Emotions can be classified using the following two methods: (1) The supervised method's efficiency is strongly dependent on the size and domain of the data collected. A categorization established using relevant data from one domain may not work well in another. (2) An unsupervised method that uses either domain expertise or a knowledge base of emotion types already exists. Though this second approach provides a suitable and generic categorization of emotions and is cost-effective, the literature doesn't possess a publicly available knowledge base that can be directly applied to any emotion categorization-related task. This pushes us to create a knowledge base that can be used for emotion classification across domains, and ontology is often used for this purpose. In this study, we provide TONE, an emotion-based ontology that effectively creates an emotional hierarchy based on Dr. Gerrod Parrot's group of emotions. In addition to ontology development, we introduce a semi-automated vocabulary construction process to generate a detailed collection of terms for emotions at each tier of the hierarchy. We also demonstrate automated methods for establishing three sorts of dependencies in order to develop linkages between different emotions. Our human and automatic evaluation results show the ontology's quality. Furthermore, we describe three distinct use cases that demonstrate the applicability of our ontology.
- [83] arXiv:2401.06925 [ pdf , ps , other ]
-
Title: Modeling Latent Selection with Structural Causal ModelsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: Selection bias is ubiquitous in real-world data, and can lead to misleading results if not dealt with properly. We introduce a conditioning operation on Structural Causal Models (SCMs) to model latent selection from a causal perspective. We show that the conditioning operation transforms an SCM with the presence of an explicit latent selection mechanism into an SCM without such selection mechanism, which partially encodes the causal semantics of the selected subpopulation according to the original SCM. Furthermore, we show that this conditioning operation preserves the simplicity, acyclicity, and linearity of SCMs, and commutes with marginalization. Thanks to these properties, combined with marginalization and intervention, the conditioning operation offers a valuable tool for conducting causal reasoning tasks within causal models where latent details have been abstracted away. We demonstrate by example how classical results of causal inference can be generalized to include selection bias and how the conditioning operation helps with modeling of real-world problems.
- [84] arXiv:2401.06979 [ pdf , ps , other ]
-
Title: Distance-aware Attention Reshaping: Enhance Generalization of Neural Solver for Large-scale Vehicle Routing ProblemsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Neural solvers based on attention mechanism have demonstrated remarkable effectiveness in solving vehicle routing problems. However, in the generalization process from small scale to large scale, we find a phenomenon of the dispersion of attention scores in existing neural solvers, which leads to poor performance. To address this issue, this paper proposes a distance-aware attention reshaping method, assisting neural solvers in solving large-scale vehicle routing problems. Specifically, without the need for additional training, we utilize the Euclidean distance information between current nodes to adjust attention scores. This enables a neural solver trained on small-scale instances to make rational choices when solving a large-scale problem. Experimental results show that the proposed method significantly outperforms existing state-of-the-art neural solvers on the large-scale CVRPLib dataset.
- [85] arXiv:2401.07056 [ pdf , other ]
-
Title: Aquarium: A Comprehensive Framework for Exploring Predator-Prey Dynamics through Multi-Agent Reinforcement Learning AlgorithmsAuthors: Michael Kölle , Yannick Erpelding , Fabian Ritz , Thomy Phan , Steffen Illium , Claudia Linnhoff-PopienComments: Accepted at ICAARTSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Recent advances in Multi-Agent Reinforcement Learning have prompted the modeling of intricate interactions between agents in simulated environments. In particular, the predator-prey dynamics have captured substantial interest and various simulations been tailored to unique requirements. To prevent further time-intensive developments, we introduce Aquarium, a comprehensive Multi-Agent Reinforcement Learning environment for predator-prey interaction, enabling the study of emergent behavior. Aquarium is open source and offers a seamless integration of the PettingZoo framework, allowing a quick start with proven algorithm implementations. It features physics-based agent movement on a two-dimensional, edge-wrapping plane. The agent-environment interaction (observations, actions, rewards) and the environment settings (agent speed, prey reproduction, predator starvation, and others) are fully customizable. Besides a resource-efficient visualization, Aquarium supports to record video files, providing a visual comprehension of agent behavior. To demonstrate the environment's capabilities, we conduct preliminary studies which use PPO to train multiple prey agents to evade a predator. In accordance to the literature, we find Individual Learning to result in worse performance than Parameter Sharing, which significantly improves coordination and sample-efficiency.
- [86] arXiv:2401.07115 [ pdf , ps , other ]
-
Title: Open Models, Closed Minds? On Agents Capabilities in Mimicking Human Personalities through Open Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Physics and Society (physics.soc-ph)
Abstract: The emergence of unveiling human-like behaviors in Large Language Models (LLMs) has led to a closer connection between NLP and human psychology, leading to a proliferation of computational agents. Scholars have been studying the inherent personalities displayed by LLM agents and attempting to incorporate human traits and behaviors into them. However, these efforts have primarily focused on commercially-licensed LLMs, neglecting the widespread use and notable advancements seen in Open LLMs. This work aims to address this gap by conducting a comprehensive examination of the ability of agents to emulate human personalities using Open LLMs. To achieve this, we generate a set of ten LLM Agents based on the most representative Open models and subject them to a series of assessments concerning the Myers-Briggs Type Indicator (MBTI) test. Our approach involves evaluating the intrinsic personality traits of Open LLM agents and determining the extent to which these agents can mimic human personalities when conditioned by specific personalities and roles. Our findings unveil that: $(i)$ each Open LLM agent showcases distinct human personalities; $(ii)$ personality-conditioned prompting produces varying effects on the agents, with only few successfully mirroring the imposed personality, while most of them being ``closed-minded'' (i.e., they retain their intrinsic traits); $(iii)$ combining role and personality conditioning can enhance the agents' ability to mimic human personalities; and $(iv)$ personalities typically associated with the role of teacher tend to be emulated with greater accuracy. Our work represents a step up in understanding the dense relationship between NLP and human psychology through the lens of Open LLMs.
- [87] arXiv:2401.07314 [ pdf , other ]
-
Title: MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language NavigationSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Abstract: Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-making and generalization abilities across various tasks. However, existing zero-shot agents for vision-and-language navigation (VLN) only prompt the GPT-4 to select potential locations within localized environments, without constructing an effective "global-view" for the agent to understand the overall environment. In this work, we present a novel map-guided GPT-based agent, dubbed MapGPT, which introduces an online linguistic-formed map to encourage the global exploration. Specifically, we build an online map and incorporate it into the prompts that include node information and topological relationships, to help GPT understand the spatial environment. Benefiting from this design, we further propose an adaptive planning mechanism to assist the agent in performing multi-step path planning based on a map, systematically exploring multiple candidate nodes or sub-goals step by step. Extensive experiments demonstrate that our MapGPT is applicable to both GPT-4 and GPT-4V, achieving state-of-the-art zero-shot performance on the R2R and REVERIE simultaneously (~10% and ~12% improvements in SR), and showcasing the newly emerged global thinking and path planning abilities of the GPT.
- [88] arXiv:2401.07324 [ pdf , other ]
-
Title: Small LLMs Are Weak Tool Learners: A Multi-LLM AgentAuthors: Weizhou Shen , Chenliang Li , Hongzhan Chen , Ming Yan , Xiaojun Quan , Hehong Chen , Ji Zhang , Fei HuangComments: On progress, github repo: this https URLSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs, empowering them to interact with external tools (e.g., APIs, functions) and complete various tasks in a self-directed fashion. The challenge of tool use demands that LLMs not only understand user queries and generate answers accurately but also excel in task planning, tool invocation, and result summarization. While traditional works focus on training a single LLM with all these capabilities, performance limitations become apparent, particularly with smaller models. To overcome these challenges, we propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. Each component is implemented by a single LLM that focuses on a specific capability and collaborates with others to accomplish the task. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability. To effectively train this framework, we introduce a two-stage training paradigm. First, we fine-tune a backbone LLM on the entire dataset without discriminating sub-tasks, providing the model with a comprehensive understanding of the task. Second, the fine-tuned LLM is used to instantiate the planner, caller, and summarizer respectively, which are continually fine-tuned on respective sub-tasks. Evaluation across various tool-use benchmarks illustrates that our proposed multi-LLM framework surpasses the traditional single-LLM approach, highlighting its efficacy and advantages in tool learning.
- [89] arXiv:2401.07359 [ pdf , ps , other ]
-
Title: Reliability and Interpretability in Science and Deep LearningAuthors: Luigi ScorzatoComments: Minor clarifications in Sec. 4.1. Misleading table removedSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); History and Philosophy of Physics (physics.hist-ph)
Abstract: In recent years, the question of the reliability of Machine Learning (ML) methods has acquired significant importance, and the analysis of the associated uncertainties has motivated a growing amount of research. However, most of these studies have applied standard error analysis to ML models, and in particular Deep Neural Network (DNN) models, which represent a rather significant departure from standard scientific modelling. It is therefore necessary to integrate the standard error analysis with a deeper epistemological analysis of the possible differences between DNN models and standard scientific modelling and the possible implications of these differences in the assessment of reliability. This article offers several contributions. First, it emphasises the ubiquitous role of model assumptions (both in ML and traditional Science) against the illusion of theory-free science. Secondly, model assumptions are analysed from the point of view of their (epistemic) complexity, which is shown to be language-independent. It is argued that the high epistemic complexity of DNN models hinders the estimate of their reliability and also their prospect of long-term progress. Some potential ways forward are suggested. Thirdly, this article identifies the close relation between a model's epistemic complexity and its interpretability, as introduced in the context of responsible AI. This clarifies in which sense, and to what extent, the lack of understanding of a model (black-box problem) impacts its interpretability in a way that is independent of individual skills. It also clarifies how interpretability is a precondition for assessing the reliability of any model, which cannot be based on statistical analysis alone. This article focuses on the comparison between traditional scientific models and DNN models. But, Random Forest and Logistic Regression models are also briefly considered.
- [90] arXiv:2401.07426 [ pdf , other ]
-
Title: Generalized Planning for the Abstraction and Reasoning CorpusComments: Accepted at AAAI 2024 (extended version)Subjects: Artificial Intelligence (cs.AI)
Abstract: The Abstraction and Reasoning Corpus (ARC) is a general artificial intelligence benchmark that poses difficulties for pure machine learning methods due to its requirement for fluid intelligence with a focus on reasoning and abstraction. In this work, we introduce an ARC solver, Generalized Planning for Abstract Reasoning (GPAR). It casts an ARC problem as a generalized planning (GP) problem, where a solution is formalized as a planning program with pointers. We express each ARC problem using the standard Planning Domain Definition Language (PDDL) coupled with external functions representing object-centric abstractions. We show how to scale up GP solvers via domain knowledge specific to ARC in the form of restrictions over the actions model, predicates, arguments and valid structure of planning programs. Our experiments demonstrate that GPAR outperforms the state-of-the-art solvers on the object-centric tasks of the ARC, showing the effectiveness of GP and the expressiveness of PDDL to model ARC problems. The challenges provided by the ARC benchmark motivate research to advance existing GP solvers and understand new relations with other planning computational models. Code is available at this http URL .
- [91] arXiv:2401.07448 [ pdf , other ]
-
Title: Formal Logic Enabled Personalized Federated Learning Through Property InferenceSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Recent advancements in federated learning (FL) have greatly facilitated the development of decentralized collaborative applications, particularly in the domain of Artificial Intelligence of Things (AIoT). However, a critical aspect missing from the current research landscape is the ability to enable data-driven client models with symbolic reasoning capabilities. Specifically, the inherent heterogeneity of participating client devices poses a significant challenge, as each client exhibits unique logic reasoning properties. Failing to consider these device-specific specifications can result in critical properties being missed in the client predictions, leading to suboptimal performance. In this work, we propose a new training paradigm that leverages temporal logic reasoning to address this issue. Our approach involves enhancing the training process by incorporating mechanically generated logic expressions for each FL client. Additionally, we introduce the concept of aggregation clusters and develop a partitioning algorithm to effectively group clients based on the alignment of their temporal reasoning properties. We evaluate the proposed method on two tasks: a real-world traffic volume prediction task consisting of sensory data from fifteen states and a smart city multi-task prediction utilizing synthetic data. The evaluation results exhibit clear improvements, with performance accuracy improved by up to 54% across all sequential prediction models.
- [92] arXiv:2401.07656 [ pdf , ps , other ]
-
Title: Learning Explainable and Better Performing Representations of POMDP StrategiesComments: Technical report for the submission to TACAS 24Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: Strategies for partially observable Markov decision processes (POMDP) typically require memory. One way to represent this memory is via automata. We present a method to learn an automaton representation of a strategy using a modification of the L*-algorithm. Compared to the tabular representation of a strategy, the resulting automaton is dramatically smaller and thus also more explainable. Moreover, in the learning process, our heuristics may even improve the strategy's performance. In contrast to approaches that synthesize an automaton directly from the POMDP thereby solving it, our approach is incomparably more scalable.
- [93] arXiv:2401.07710 [ pdf , ps , other ]
-
Title: Go-Explore for Residential Energy ManagementSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: Reinforcement learning is commonly applied in residential energy management, particularly for optimizing energy costs. However, RL agents often face challenges when dealing with deceptive and sparse rewards in the energy control domain, especially with stochastic rewards. In such situations, thorough exploration becomes crucial for learning an optimal policy. Unfortunately, the exploration mechanism can be misled by deceptive reward signals, making thorough exploration difficult. Go-Explore is a family of algorithms which combines planning methods and reinforcement learning methods to achieve efficient exploration. We use the Go-Explore algorithm to solve the cost-saving task in residential energy management problems and achieve an improvement of up to 19.84\% compared to the well-known reinforcement learning algorithms.
- [94] arXiv:2401.07722 [ pdf , other ]
-
Title: Inferring Preferences from Demonstrations in Multi-Objective Residential Energy ManagementSubjects: Artificial Intelligence (cs.AI)
Abstract: It is often challenging for a user to articulate their preferences accurately in multi-objective decision-making problems. Demonstration-based preference inference (DemoPI) is a promising approach to mitigate this problem. Understanding the behaviours and values of energy customers is an example of a scenario where preference inference can be used to gain insights into the values of energy customers with multiple objectives, e.g. cost and comfort. In this work, we applied the state-of-art DemoPI method, i.e., the dynamic weight-based preference inference (DWPI) algorithm in a multi-objective residential energy consumption setting to infer preferences from energy consumption demonstrations by simulated users following a rule-based approach. According to our experimental results, the DWPI model achieves accurate demonstration-based preference inferring in three scenarios. These advancements enhance the usability and effectiveness of multi-objective reinforcement learning (MORL) in energy management, enabling more intuitive and user-friendly preference specifications, and opening the door for DWPI to be applied in real-world settings.
- [95] arXiv:2401.07744 [ pdf , other ]
-
Title: Combining Machine Learning and Ontology: A Systematic Literature ReviewSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Motivated by the desire to explore the process of combining inductive and deductive reasoning, we conducted a systematic literature review of articles that investigate the integration of machine learning and ontologies. The objective was to identify diverse techniques that incorporate both inductive reasoning (performed by machine learning) and deductive reasoning (performed by ontologies) into artificial intelligence systems. Our review, which included the analysis of 128 studies, allowed us to identify three main categories of hybridization between machine learning and ontologies: learning-enhanced ontologies, semantic data mining, and learning and reasoning systems. We provide a comprehensive examination of all these categories, emphasizing the various machine learning algorithms utilized in the studies. Furthermore, we compared our classification with similar recent work in the field of hybrid AI and neuro-symbolic approaches.
- [96] arXiv:2401.07764 [ pdf , other ]
-
Title: When Large Language Model Agents Meet 6G Networks: Perception, Grounding, and AlignmentAuthors: Minrui Xu , Dusit Niyato , Jiawen Kang , Zehui Xiong , Shiwen Mao , Zhu Han , Dong In Kim , Khaled B. LetaiefSubjects: Artificial Intelligence (cs.AI) ; Networking and Internet Architecture (cs.NI)
Abstract: AI agents based on multimodal large language models (LLMs) are expected to revolutionize human-computer interaction and offer more personalized assistant services across various domains like healthcare, education, manufacturing, and entertainment. Deploying LLM agents in 6G networks enables users to access previously expensive AI assistant services via mobile devices democratically, thereby reducing interaction latency and better preserving user privacy. Nevertheless, the limited capacity of mobile devices constrains the effectiveness of deploying and executing local LLMs, which necessitates offloading complex tasks to global LLMs running on edge servers during long-horizon interactions. In this article, we propose a split learning system for LLM agents in 6G networks leveraging the collaboration between mobile devices and edge servers, where multiple LLMs with different roles are distributed across mobile devices and edge servers to perform user-agent interactive tasks collaboratively. In the proposed system, LLM agents are split into perception, grounding, and alignment modules, facilitating inter-module communications to meet extended user requirements on 6G network functions, including integrated sensing and communication, digital twins, and task-oriented communications. Furthermore, we introduce a novel model caching algorithm for LLMs within the proposed system to improve model utilization in context, thus reducing network costs of the collaborative mobile and edge LLM agents.
- [97] arXiv:2401.07871 [ pdf , other ]
-
Title: Explainable Predictive Maintenance: A Survey of Current Methods, Challenges and OpportunitiesAuthors: Logan Cummins , Alex Sommers , Somayeh Bakhtiari Ramezani , Sudip Mittal , Joseph Jabour , Maria Seale , Shahram RahimiSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Predictive maintenance is a well studied collection of techniques that aims to prolong the life of a mechanical system by using artificial intelligence and machine learning to predict the optimal time to perform maintenance. The methods allow maintainers of systems and hardware to reduce financial and time costs of upkeep. As these methods are adopted for more serious and potentially life-threatening applications, the human operators need trust the predictive system. This attracts the field of Explainable AI (XAI) to introduce explainability and interpretability into the predictive system. XAI brings methods to the field of predictive maintenance that can amplify trust in the users while maintaining well-performing systems. This survey on explainable predictive maintenance (XPM) discusses and presents the current methods of XAI as applied to predictive maintenance while following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines. We categorize the different XPM methods into groups that follow the XAI literature. Additionally, we include current challenges and a discussion on future research directions in XPM.
- [98] arXiv:2401.07890 [ pdf , other ]
-
Title: A Strategy for Implementing description Temporal Dynamic Algorithms in Dynamic Knowledge Graphs by SPINAuthors: Alireza Shahbazi , Seyyed Ahmad Mirsanei , Malikeh Haj Khan Mirzaye Sarraf , Behrouz Minaei BidgoliSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: Planning and reasoning about actions and processes, in addition to reasoning about propositions, are important issues in recent logical and computer science studies. The widespread use of actions in everyday life such as IoT, semantic web services, etc., and the limitations and issues in the action formalisms are two factors that lead us to study how actions are represented.
Since 2007, there have been some ideas to integrate Description Logic (DL) and action formalisms for representing both static and dynamic knowledge. Meanwhile, time is an important factor in dynamic situations, and actions change states over time. In this study, on the one hand, we examined related logical structures such as extensions of description logics (DLs), temporal formalisms, and action formalisms. On the other hand, we analyzed possible tools for designing and developing the Knowledge and Action Base (KAB).
For representation and reasoning about actions, we embedded actions into DLs (such as Dynamic-ALC and its extensions). We propose a terminable algorithm for action projection, planning, checking the satisfiability, consistency, realizability, and executability, and also querying from KAB. Actions in this framework were modeled with SPIN and added to state space. This framework has also been implemented as a plugin for the Protégé ontology editor.
During the last two decades, various algorithms have been presented, but due to the high computational complexity, we face many problems in implementing dynamic ontologies. In addition, an algorithm to detect the inconsistency of actions' effects was not explicitly stated. In the proposed strategy, the interactions of actions with other parts of modeled knowledge, and a method to check consistency between the effects of actions are presented. With this framework, the ramification problem can be well handled in future works. - [99] arXiv:2401.07964 [ pdf , ps , other ]
-
Title: AI-as-exploration: Navigating intelligence spaceAuthors: Dimitri Coelho MolloSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Artificial Intelligence is a field that lives many lives, and the term has come to encompass a motley collection of scientific and commercial endeavours. In this paper, I articulate the contours of a rather neglected but central scientific role that AI has to play, which I dub `AI-as-exploration'.The basic thrust of AI-as-exploration is that of creating and studying systems that can reveal candidate building blocks of intelligence that may differ from the forms of human and animal intelligence we are familiar with. In other words, I suggest that AI is one of the best tools we have for exploring intelligence space, namely the space of possible intelligent systems. I illustrate the value of AI-as-exploration by focusing on a specific case study, i.e., recent work on the capacity to combine novel and invented concepts in humans and Large Language Models. I show that the latter, despite showing human-level accuracy in such a task, most probably solve it in ways radically different, but no less relevant to intelligence research, to those hypothesised for humans.
- [100] arXiv:2401.08008 [ pdf , other ]
-
Title: Analysing the Needs of Homeless People Using Feature Selection and Mining Association RulesAuthors: José M. Alcalde-Llergo , Carlos García-Martínez , Manuel Vaquero-Abellán , Pilar Aparicio-Martínez , Enrique Yeguas-BolívarComments: 6 pages, 4 figures, 4 tables, MetroXRAINE 2022Journal-ref: 2022 IEEE International Conference on Metrology for Extended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Rome, Italy, 2022, pp. 568-573Subjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: Homelessness is a social and health problem with great repercussions in Europe. Many non-governmental organisations help homeless people by collecting and analysing large amounts of information about them. However, these tasks are not always easy to perform, and hinder other of the organisations duties. The SINTECH project was created to tackle this issue proposing two different tools: a mobile application to quickly and easily collect data; and a software based on artificial intelligence which obtains interesting information from the collected data. The first one has been distributed to some Spanish organisations which are using it to conduct surveys of homeless people. The second tool implements different feature selection and association rules mining methods. These artificial intelligence techniques have allowed us to identify the most relevant features and some interesting association rules from previously collected homeless data.
- [101] arXiv:2401.08025 [ pdf , other ]
-
Title: Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-ImaginationComments: 18 pages, 9 figures, 12 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The potential of Vision-Language Models (VLMs) often remains underutilized in handling complex text-based problems, particularly when these problems could benefit from visual representation. Resonating with humans' ability to solve complex text-based problems by (1) creating a visual diagram from the problem and (2) deducing what steps they need to take to solve it, we propose Self-Imagine. We leverage a single Vision-Language Model (VLM) to generate a structured representation of the question using HTML, then render the HTML as an image, and finally use the same VLM to answer the question using both the question and the image. Our approach does not require any additional training data or training. We evaluate our approach on three mathematics tasks and nine general-purpose reasoning tasks using state-of-the-art (LLAVA-1.5 and GEMINI PRO) VLMs. Our approach boosts the performance of LLAVA-1.5 and GEMINI PRO on all math tasks (on average GSM8K: +3.1%; ASDIV: +3.2%; SVAMP: +6.9%) and the majority of the general-purpose reasoning tasks by 3.2% to 6.0% on average.
- [102] arXiv:2401.08189 [ pdf , other ]
-
Title: PRewrite: Prompt Rewriting with Reinforcement LearningSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Prompt engineering is critical for the development of LLM-based applications. However, it is usually done manually in a "trial and error" fashion that can be time consuming, ineffective, and sub-optimal. Even for the prompts which seemingly work well, there is always a lingering question: can the prompts be made better with further modifications?
To address these problems, we investigate automated prompt engineering in this paper. Specifically, we propose PRewrite, an automated method to rewrite an under-optimized prompt to a more effective prompt. We instantiate the prompt rewriter using a LLM. The rewriter LLM is trained using reinforcement learning to optimize the performance on a given downstream task. We conduct experiments on diverse benchmark datasets, which demonstrates the effectiveness of PRewrite. - [103] arXiv:2401.08460 [ pdf , other ]
-
Title: Reinforcement Learning for Conversational Question Answering over Knowledge GraphAuthors: Mi WuSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Conversational question answering (ConvQA) over law knowledge bases (KBs) involves answering multi-turn natural language questions about law and hope to find answers in the law knowledge base. Despite many methods have been proposed. Existing law knowledge base ConvQA model assume that the input question is clear and can perfectly reflect user's intention. However, in real world, the input questions are noisy and inexplict. This makes the model hard to find the correct answer in the law knowledge bases. In this paper, we try to use reinforcement learning to solve this problem. The reinforcement learning agent can automatically learn how to find the answer based on the input question and the conversation history, even when the input question is inexplicit. We test the proposed method on several real world datasets and the results show the effectivenss of the proposed model.
- [104] arXiv:2401.08461 [ pdf , other ]
-
Title: Decentralised Emergence of Robust and Adaptive Linguistic Conventions in Populations of Autonomous Agents Grounded in Continuous WorldsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE)
Abstract: This paper introduces a methodology through which a population of autonomous agents can establish a linguistic convention that enables them to refer to arbitrary entities that they observe in their environment. The linguistic convention emerges in a decentralised manner through local communicative interactions between pairs of agents drawn from the population. The convention consists of symbolic labels (word forms) associated to concept representations (word meanings) that are grounded in a continuous feature space. The concept representations of each agent are individually constructed yet compatible on a communicative level. Through a range of experiments, we show (i) that the methodology enables a population to converge on a communicatively effective, coherent and human-interpretable linguistic convention, (ii) that it is naturally robust against sensor defects in individual agents, (iii) that it can effectively deal with noisy observations, uncalibrated sensors and heteromorphic populations, (iv) that the method is adequate for continual learning, and (v) that the convention self-adapts to changes in the environment and communicative needs of the agents.
- [105] arXiv:2401.08517 [ pdf , other ]
-
Title: Supporting Student Decisions on Learning Recommendations: An LLM-Based Chatbot with Knowledge Graph Contextualization for Conversational Explainability and MentoringSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Student commitment towards a learning recommendation is not separable from their understanding of the reasons it was recommended to them; and their ability to modify it based on that understanding. Among explainability approaches, chatbots offer the potential to engage the student in a conversation, similar to a discussion with a peer or a mentor. The capabilities of chatbots, however, are still not sufficient to replace a human mentor, despite the advancements of generative AI (GenAI) and large language models (LLM). Therefore, we propose an approach to utilize chatbots as mediators of the conversation and sources of limited and controlled generation of explanations, to harvest the potential of LLMs while reducing their potential risks at the same time. The proposed LLM-based chatbot supports students in understanding learning-paths recommendations. We use a knowledge graph (KG) as a human-curated source of information, to regulate the LLM's output through defining its prompt's context. A group chat approach is developed to connect students with human mentors, either on demand or in cases that exceed the chatbot's pre-defined tasks. We evaluate the chatbot with a user study, to provide a proof-of-concept and highlight the potential requirements and limitations of utilizing chatbots in conversational explainability.
- [106] arXiv:2401.08525 [ pdf , other ]
-
Title: GATS: Gather-Attend-ScatterAuthors: Konrad Zolna , Serkan Cabi , Yutian Chen , Eric Lau , Claudio Fantacci , Jurgis Pasukonis , Jost Tobias Springenberg , Sergio Gomez ColmenarejoSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalities at different rates. In contrast to traditional fine-tuning, GATS allows for the original component models to remain frozen, avoiding the risk of them losing important knowledge acquired during the pretraining phase. We demonstrate the utility and versatility of GATS with a few experiments across games, robotics, and multimodal input-output systems.
- [107] arXiv:2401.08660 [ pdf , other ]
-
Title: Gemini Pro Defeated by GPT-4V: Evidence from EducationSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: This study compared the classification performance of Gemini Pro and GPT-4V in educational settings. Employing visual question answering (VQA) techniques, the study examined both models' abilities to read text-based rubrics and then automatically score student-drawn models in science education. We employed both quantitative and qualitative analyses using a dataset derived from student-drawn scientific models and employing NERIF (Notation-Enhanced Rubrics for Image Feedback) prompting methods. The findings reveal that GPT-4V significantly outperforms Gemini Pro in terms of scoring accuracy and Quadratic Weighted Kappa. The qualitative analysis reveals that the differences may be due to the models' ability to process fine-grained texts in images and overall image classification performance. Even adapting the NERIF approach by further de-sizing the input images, Gemini Pro seems not able to perform as well as GPT-4V. The findings suggest GPT-4V's superior capability in handling complex multimodal educational tasks. The study concludes that while both models represent advancements in AI, GPT-4V's higher performance makes it a more suitable tool for educational applications involving multimodal data interpretation.
- [108] arXiv:2401.08663 [ pdf , other ]
-
Title: An Integrated Imitation and Reinforcement Learning Methodology for Robust Agile Aircraft Control with Limited Pilot Demonstration DataAuthors: Gulay Goktas Sever , Umut Demir , Abdullah Sadik Satir , Mustafa Cagatay Sahin , Nazim Kemal UreComments: Preprint submitted to Aerospace Science and TechnologySubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Robotics (cs.RO); Systems and Control (eess.SY)
Abstract: In this paper, we present a methodology for constructing data-driven maneuver generation models for agile aircraft that can generalize across a wide range of trim conditions and aircraft model parameters. Maneuver generation models play a crucial role in the testing and evaluation of aircraft prototypes, providing insights into the maneuverability and agility of the aircraft. However, constructing the models typically requires extensive amounts of real pilot data, which can be time-consuming and costly to obtain. Moreover, models built with limited data often struggle to generalize beyond the specific flight conditions covered in the original dataset. To address these challenges, we propose a hybrid architecture that leverages a simulation model, referred to as the source model. This open-source agile aircraft simulator shares similar dynamics with the target aircraft and allows us to generate unlimited data for building a proxy maneuver generation model. We then fine-tune this model to the target aircraft using a limited amount of real pilot data. Our approach combines techniques from imitation learning, transfer learning, and reinforcement learning to achieve this objective. To validate our methodology, we utilize real agile pilot data provided by Turkish Aerospace Industries (TAI). By employing the F-16 as the source model, we demonstrate that it is possible to construct a maneuver generation model that generalizes across various trim conditions and aircraft parameters without requiring any additional real pilot data. Our results showcase the effectiveness of our approach in developing robust and adaptable models for agile aircraft.
- [109] arXiv:2401.08664 [ pdf , other ]
-
Title: Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and ChallengesAuthors: Qingyao Li , Lingyue Fu , Weiming Zhang , Xianyu Chen , Jingwei Yu , Wei Xia , Weinan Zhang , Ruiming Tang , Yong YuComments: 13 pages, 3 figures, 1 tableSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Online education platforms, leveraging the internet to distribute education resources, seek to provide convenient education but often fall short in real-time communication with students. They often struggle to offer personalized education resources due to the challenge of addressing the diverse obstacles students encounter throughout their learning journey. Recently, the emergence of large language models (LLMs), such as ChatGPT, offers the possibility for resolving this issue by comprehending individual requests. Although LLMs have been successful in various fields, creating an LLM-based education system is still challenging for the wide range of educational skills required. This paper reviews the recently emerged LLM researches related to educational capabilities, including mathematics, writing, programming, reasoning, and knowledge-based question answering, with the aim to explore their potential in constructing the next-generation intelligent education system. Based on the current development status, we further outline two approaches for an LLM-based education system: a unified approach and a mixture-of-expert (MoE) approach. Finally, we explore the challenges and future directions, providing new research opportunities and perspectives on adapting LLMs for education.
- [110] arXiv:2401.08695 [ pdf , other ]
-
Title: Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven IntelligenceAuthors: Zhengqing Fang , Shuowen Zhou , Zhouhang Yuan , Yuxuan Si , Mengze Li , Jinxu Li , Yesheng Xu , Wenjia Xie , Kun Kuang , Yingming Li , Fei Wu , Yu-Feng YaoComments: 33 pagesSubjects: Artificial Intelligence (cs.AI) ; Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Abstract: Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a visualized reasoning process containing AI-based biomarkers and retrieved cases that with the same diagnostic patterns. It embraces clinicians' prompts into the interpreted reasoning through human-AI interaction, leading to potentially enhanced safety and more accurate predictions. This study investigates the performance, interpretability, and clinical utility of KGDM in the diagnosis of infectious keratitis (IK), which is the leading cause of corneal blindness. The classification performance of KGDM is evaluated on a prospective validation dataset, an external testing dataset, and an publicly available testing dataset. The diagnostic odds ratios (DOR) of the interpreted AI-based biomarkers are effective, ranging from 3.011 to 35.233 and exhibit consistent diagnostic patterns with clinic experience. Moreover, a human-AI collaborative diagnosis test is conducted and the participants with collaboration achieved a performance exceeding that of both humans and AI. By synergistically integrating interpretability and interaction, this study facilitates the convergence of clinicians' expertise and data-driven intelligence. The promotion of inexperienced ophthalmologists with the aid of AI-based biomarkers, as well as increased AI prediction by intervention from experienced ones, demonstrate a promising diagnostic paradigm for infectious keratitis using KGDM, which holds the potential for extension to other diseases where experienced medical practitioners are limited and the safety of AI is concerned.
- [111] arXiv:2401.08743 [ pdf , other ]
-
Title: MMToM-QA: Multimodal Theory of Mind Question AnsweringAuthors: Chuanyang Jin , Yutong Wu , Jing Cao , Jiannan Xiang , Yen-Ling Kuo , Zhiting Hu , Tomer Ullman , Antonio Torralba , Joshua B. Tenenbaum , Tianmin ShuComments: 27 pages, 11 figures, 7 tablesSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Theory of Mind (ToM), the ability to understand people's minds, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than video or text understanding. People can flexibly reason about another person's mind based on conceptual representations (e.g., goals, beliefs, plans) extracted from any available data, which can include visual cues, linguistic narratives, or both. To address this, we introduce a multimodal Theory of Mind question answering (MMToM-QA) benchmark. MMToM-QA comprehensively evaluates machine ToM both on multimodal data and on different kinds of unimodal data about a person's activity in a household environment. To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models). BIP-ALM extracts unified representations from multimodal data and utilizes language models for scalable Bayesian inverse planning. We conducted a systematic comparison of human performance, BIP-ALM, and state-of-the-art models, including GPT-4. The experiments demonstrate that large language models and large multimodal models still lack robust ToM capacity. BIP-ALM, on the other hand, shows promising results, by leveraging the power of both model-based mental inference and language models.
- [112] arXiv:2401.08879 [ pdf , other ]
-
Title: Contribution Functions for Quantitative Bipolar Argumentation Graphs: A Principle-based AnalysisSubjects: Artificial Intelligence (cs.AI)
Abstract: We present a principle-based analysis of contribution functions for quantitative bipolar argumentation graphs that quantify the contribution of one argument to another. The introduced principles formalise the intuitions underlying different contribution functions as well as expectations one would have regarding the behaviour of contribution functions in general. As none of the covered contribution functions satisfies all principles, our analysis can serve as a tool that enables the selection of the most suitable function based on the requirements of a given use case.
- [113] arXiv:2401.08936 [ pdf , other ]
-
Title: DeLF: Designing Learning Environments with Foundation ModelsComments: AAAI 2024 Workshop on Synergy of Reinforcement Learning and Large Language ModelsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Reinforcement learning (RL) offers a capable and intuitive structure for the fundamental sequential decision-making problem. Despite impressive breakthroughs, it can still be difficult to employ RL in practice in many simple applications. In this paper, we try to address this issue by introducing a method for designing the components of the RL environment for a given, user-intended application. We provide an initial formalization for the problem of RL component design, that concentrates on designing a good representation for observation and action space. We propose a method named DeLF: Designing Learning Environments with Foundation Models, that employs large language models to design and codify the user's intended learning scenario. By testing our method on four different learning environments, we demonstrate that DeLF can obtain executable environment codes for the corresponding RL problems.
- [114] arXiv:2401.08999 [ pdf , other ]
-
Title: Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous AgentAuthors: Hugo Laurencon , Yesoda Bhargava , Riddhi Zantye , Charbel-Raphaël Ségerie , Johann Lussange , Veeky Baths , Boris GutkinComments: This work is a result of the ongoing collaboration between Cognitive Neuroscience Lab, BITS Pilani K K Birla Goa Campus and Ecole Normale Superieure, Paris France. This work is jointly supervised by Prof. Boris Gutkin and Prof. Veeky Baths. arXiv admin note: substantial text overlap with arXiv:2109.06580Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Homeostasis is a biological process by which living beings maintain their internal balance. Previous research suggests that homeostasis is a learned behaviour. Recently introduced Homeostatic Regulated Reinforcement Learning (HRRL) framework attempts to explain this learned homeostatic behavior by linking Drive Reduction Theory and Reinforcement Learning. This linkage has been proven in the discrete time-space, but not in the continuous time-space. In this work, we advance the HRRL framework to a continuous time-space environment and validate the CTCS-HRRL (Continuous Time Continuous Space HRRL) framework. We achieve this by designing a model that mimics the homeostatic mechanisms in a real-world biological agent. This model uses the Hamilton-Jacobian Bellman Equation, and function approximation based on neural networks and Reinforcement Learning. Through a simulation-based experiment we demonstrate the efficacy of this model and uncover the evidence linked to the agent's ability to dynamically choose policies that favor homeostasis in a continuously changing internal-state milieu. Results of our experiments demonstrate that agent learns homeostatic behaviour in a CTCS environment, making CTCS-HRRL a promising framework for modellng animal dynamics and decision-making.
- [115] arXiv:2401.09042 [ pdf , other ]
-
Title: LLMs for Relational Reasoning: How Far are We?Authors: Zhiming Li , Yushi Cao , Xiufeng Xu , Junzhe Jiang , Xu Liu , Yon Shin Teo , Shang-wei Lin , Yang LiuComments: Accepted by The First International Workshop on Large Language Models for Code (ICSE 2024)Subjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general artificial intelligence, there has been a surge of interest in investigating the reasoning ability of the LLMs. Whereas the textual and numerical reasoning benchmarks adopted by previous works are rather shallow and simple, it is hard to conclude that the LLMs possess strong reasoning ability by merely achieving positive results on these benchmarks. Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems that require common-sense planning by evaluating their performance on the reinforcement learning benchmarks. In this work, we conduct an in-depth assessment of several state-of-the-art LLMs' reasoning ability based on the inductive logic programming (ILP) benchmark, which is broadly recognized as a representative and challenging measurement for evaluating logic program induction/synthesis systems as it requires inducing strict cause-effect logic to achieve robust deduction on independent and identically distributed (IID) and out-of-distribution (OOD) test samples. Our evaluations illustrate that compared with the neural program induction systems which are much smaller in model size, the state-of-the-art LLMs are much poorer in terms of reasoning ability by achieving much lower performance and generalization using either natural language prompting or truth-value matrix prompting.
- [116] arXiv:2401.09070 [ pdf , ps , other ]
-
Title: Knowledge Pyramid: A Novel Hierarchical Reasoning Structure for Generalized Knowledge Augmentation and InferenceComments: 10 pages,8 figuresSubjects: Artificial Intelligence (cs.AI) ; Information Retrieval (cs.IR)
Abstract: Knowledge graph (KG) based reasoning has been regarded as an effective means for the analysis of semantic networks and is of great usefulness in areas of information retrieval, recommendation, decision-making, and man-machine interaction. It is widely used in recommendation, decision-making, question-answering, search, and other fields. However, previous studies mainly used low-level knowledge in the KG for reasoning, which may result in insufficient generalization and poor robustness of reasoning. To this end, this paper proposes a new inference approach using a novel knowledge augmentation strategy to improve the generalization capability of KG. This framework extracts high-level pyramidal knowledge from low-level knowledge and applies it to reasoning in a multi-level hierarchical KG, called knowledge pyramid in this paper. We tested some medical data sets using the proposed approach, and the experimental results show that the proposed knowledge pyramid has improved the knowledge inference performance with better generalization. Especially, when there are fewer training samples, the inference accuracy can be significantly improved.
- [117] arXiv:2401.09444 [ pdf , other ]
-
Title: Online Handbook of Argumentation for AI: Volume 4Authors: Lars Bengel , Lydia Blümel , Elfia Bezou-Vrakatseli , Federico Castagna , Giulia D'Agostino , Isabelle Kuhlmann , Jack Mumford , Daphne Odekerken , Fabrizio Russo , Stefan Sarkadi , Madeleine Waller , Andreas XydisSubjects: Artificial Intelligence (cs.AI)
Abstract: This volume contains revised versions of the papers selected for the fourth volume of the Online Handbook of Argumentation for AI (OHAAI). Previously, formal theories of argument and argument interaction have been proposed and studied, and this has led to the more recent study of computational models of argument. Argumentation, as a field within artificial intelligence (AI), is highly relevant for researchers interested in symbolic representations of knowledge and defeasible reasoning. The purpose of this handbook is to provide an open access and curated anthology for the argumentation research community. OHAAI is designed to serve as a research hub to keep track of the latest and upcoming PhD-driven research on the theory and application of argumentation in all areas related to AI.
- [118] arXiv:2401.09448 [ pdf , other ]
-
Title: Tumbug: A pictorial, universal knowledge representation methodAuthors: Mark A. AtkinsComments: 346 pages, 334 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Since the key to artificial general intelligence (AGI) is commonly believed to be commonsense reasoning (CSR) or, roughly equivalently, discovery of a knowledge representation method (KRM) that is particularly suitable for CSR, the author developed a custom KRM for CSR. This novel KRM called Tumbug was designed to be pictorial in nature because there exists increasing evidence that the human brain uses some pictorial type of KRM, and no well-known prior research in AGI has researched this KRM possibility. Tumbug is somewhat similar to Roger Schank's Conceptual Dependency (CD) theory, but Tumbug is pictorial and uses about 30 components based on fundamental concepts from the sciences and human life, in contrast to CD theory, which is textual and uses about 17 components (= 6 Primitive Conceptual Categories + 11 Primitive Acts) based mainly on human-oriented activities. All the Building Blocks of Tumbug were found to generalize to only five Basic Building Blocks that exactly correspond to the three components {O, A, V} of traditional Object-Attribute-Value representation plus two new components {C, S}, which are Change and System. Collectively this set of five components, called "SCOVA," seems to be a universal foundation for all knowledge representation.
- [119] arXiv:2401.09491 [ pdf , ps , other ]
-
Title: Memory, Space, and Planning: Multiscale Predictive RepresentationsAuthors: Ida MomennejadComments: To be published as a chapter in an edited volume by Oxford University Press (Editors: Sara Aronowitz and Lynn Nadel)Subjects: Artificial Intelligence (cs.AI)
Abstract: Memory is inherently entangled with prediction and planning. Flexible behavior in biological and artificial agents depends on the interplay of learning from the past and predicting the future in ever-changing environments. This chapter reviews computational, behavioral, and neural evidence suggesting these processes rely on learning the relational structure of experiences, known as cognitive maps, and draws two key takeaways. First, that these memory structures are organized as multiscale, compact predictive representations in hippocampal and prefrontal cortex, or PFC, hierarchies. Second, we argue that such predictive memory structures are crucial to the complementary functions of the hippocampus and PFC, both for enabling a recall of detailed and coherent past episodes as well as generalizing experiences at varying scales for efficient prediction and planning. These insights advance our understanding of memory and planning mechanisms in the brain and hold significant implications for advancing artificial intelligence systems.
- [120] arXiv:2401.09680 [ pdf , ps , other ]
-
Title: Tiny Multi-Agent DRL for Twins Migration in UAV Metaverses: A Multi-Leader Multi-Follower Stackelberg Game ApproachAuthors: Jiawen Kang , Yue Zhong , Minrui Xu , Jiangtian Nie , Jinbo Wen , Hongyang Du , Dongdong Ye , Xumin Huang , Dusit Niyato , Shengli XieSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT)
Abstract: The synergy between Unmanned Aerial Vehicles (UAVs) and metaverses is giving rise to an emerging paradigm named UAV metaverses, which create a unified ecosystem that blends physical and virtual spaces, transforming drone interaction and virtual exploration. UAV Twins (UTs), as the digital twins of UAVs that revolutionize UAV applications by making them more immersive, realistic, and informative, are deployed and updated on ground base stations, e.g., RoadSide Units (RSUs), to offer metaverse services for UAV Metaverse Users (UMUs). Due to the dynamic mobility of UAVs and limited communication coverages of RSUs, it is essential to perform real-time UT migration to ensure seamless immersive experiences for UMUs. However, selecting appropriate RSUs and optimizing the required bandwidth is challenging for achieving reliable and efficient UT migration. To address the challenges, we propose a tiny machine learning-based Stackelberg game framework based on pruning techniques for efficient UT migration in UAV metaverses. Specifically, we formulate a multi-leader multi-follower Stackelberg model considering a new immersion metric of UMUs in the utilities of UAVs. Then, we design a Tiny Multi-Agent Deep Reinforcement Learning (Tiny MADRL) algorithm to obtain the tiny networks representing the optimal game solution. Specifically, the actor-critic network leverages the pruning techniques to reduce the number of network parameters and achieve model size and computation reduction, allowing for efficient implementation of Tiny MADRL. Numerical results demonstrate that our proposed schemes have better performance than traditional schemes.
- [121] arXiv:2401.09789 [ pdf , ps , other ]
-
Title: A Semantic Approach for Big Data Exploration in Industry 4.0Comments: Published version of paper: Idoia Berges, V\'ictor Julio Ram\'irez-Dur\'an, Arantza Illarramendi: A Semantic Approach for Big Data Exploration in Industry 4.0. Big Data Res. 25: 100222 (2021). DOI: 10.1016/j.bdr.2021.100222Journal-ref: Big Data Res. 25: 100222 (2021)Subjects: Artificial Intelligence (cs.AI) ; Databases (cs.DB)
Abstract: The growing trends in automation, Internet of Things, big data and cloud computing technologies have led to the fourth industrial revolution (Industry 4.0), where it is possible to visualize and identify patterns and insights, which results in a better understanding of the data and can improve the manufacturing process. However, many times, the task of data exploration results difficult for manufacturing experts because they might be interested in analyzing also data that does not appear in pre-designed visualizations and therefore they must be assisted by Information Technology experts. In this paper, we present a proposal materialized in a semantic-based visual query system developed for a real Industry 4.0 scenario that allows domain experts to explore and visualize data in a friendly way. The main novelty of the system is the combined use that it makes of captured data that are semantically annotated first, and a 2D customized digital representation of a machine that is also linked with semantic descriptions. Those descriptions are expressed using terms of an ontology, where, among others, the sensors that are used to capture indicators about the performance of a machine that belongs to a Industry 4.0 scenario have been modeled. Moreover, this semantic description allows to: formulate queries at a higher level of abstraction, provide customized graphical visualizations of the results based on the format and nature of the data, and download enriched data enabling further types of analysis.
- [122] arXiv:2401.09851 [ pdf , other ]
-
Title: Sophisticated Behavioral Simulation: A Possible Solution to Problems of Organized ComplexityAuthors: Cheng Wang , Chuwen Wang , Yu Zhao , Wang Zhang , Shirong Zeng , Ronghui Ning , Changjun JiangSubjects: Artificial Intelligence (cs.AI)
Abstract: Simulation technologies have been widely utilized in many scientific research fields such as weather forecasting, fluid mechanics, and biological populations. As a matter of facts, they act as the best tool to handle problems in complex systems where closed-form expressions are unavailable and the target distribution in the representation space is too complex to be fully represented by data-driven learning models, such as deep learning (DL) models. This paper investigates the effectiveness and preference of simulation technologies based on the analyses of scientific paradigms and problems. We revisit the evolution of scientific paradigms from the perspective of data, algorithms, and computational power, and rethink a classic classification of scientific problems which consists of the problems of organized simplicity, problems of disorganized complexity, and problems of organized complexity. These different problems reflect the strengths of different paradigms, indicating that a new simulation technology integrating different paradigms is required to deal with unresolved problems of organized complexity in more complex systems. Therefore, we summarize existent simulation technologies aligning with the scientific paradigms, and propose the concept of behavioral simulation (BS), and further sophisticated behavioral simulation (SBS). They represent a higher degree of paradigms integration based on foundation models to simulate complex social systems involving sophisticated human strategies and behaviors. Beyond the capacity of traditional agent-based modeling simulation (ABMS), BS and further SBS are designed to tackle challenges concerning the complex human system, which can be regarded as a possible next paradigm for science. Through this work, we look forward to more powerful BS and SBS applications in scientific research branches within social science.
- [123] arXiv:2401.09966 [ pdf , other ]
-
Title: Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and SelectionSubjects: Artificial Intelligence (cs.AI)
Abstract: Endowing machines with abstract reasoning ability has been a long-term research topic in artificial intelligence. Raven's Progressive Matrix (RPM) is widely used to probe abstract visual reasoning in machine intelligence, where models will analyze the underlying rules and select one image from candidates to complete the image matrix. Participators of RPM tests can show powerful reasoning ability by inferring and combining attribute-changing rules and imagining the missing images at arbitrary positions of a matrix. However, existing solvers can hardly manifest such an ability in realistic RPM tests. In this paper, we propose a deep latent variable model for answer generation problems through Rule AbstractIon and SElection (RAISE). RAISE can encode image attributes into latent concepts and abstract atomic rules that act on the latent concepts. When generating answers, RAISE selects one atomic rule out of the global knowledge set for each latent concept to constitute the underlying rule of an RPM. In the experiments of bottom-right and arbitrary-position answer generation, RAISE outperforms the compared solvers in most configurations of realistic RPM datasets. In the odd-one-out task and two held-out configurations, RAISE can leverage acquired latent concepts and atomic rules to find the rule-breaking image in a matrix and handle problems with unseen combinations of rules and attributes.
- [124] arXiv:2401.10101 [ pdf , other ]
-
Title: Counterfactual Reasoning with Probabilistic Graphical Models for Analyzing Socioecological SystemsComments: 34 pagesSubjects: Artificial Intelligence (cs.AI) ; Probability (math.PR); Applications (stat.AP)
Abstract: Causal and counterfactual reasoning are emerging directions in data science that allow us to reason about hypothetical scenarios. This is particularly useful in domains where experimental data are usually not available. In the context of environmental and ecological sciences, causality enables us, for example, to predict how an ecosystem would respond to hypothetical interventions. A structural causal model is a class of probabilistic graphical models for causality, which, due to its intuitive nature, can be easily understood by experts in multiple fields. However, certain queries, called unidentifiable, cannot be calculated in an exact and precise manner. This paper proposes applying a novel and recent technique for bounding unidentifiable queries within the domain of socioecological systems. Our findings indicate that traditional statistical analysis, including probabilistic graphical models, can identify the influence between variables. However, such methods do not offer insights into the nature of the relationship, specifically whether it involves necessity or sufficiency. This is where counterfactual reasoning becomes valuable.
- [125] arXiv:2401.10420 [ pdf , other ]
-
Title: Generalized Nested Rollout Policy Adaptation with Limited RepetitionsAuthors: Tristan CazenaveSubjects: Artificial Intelligence (cs.AI)
Abstract: Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three different combinatorial problems: Inverse RNA Folding, the Traveling Salesman Problem with Time Windows and the Weak Schur problem.
- [126] arXiv:2401.10428 [ pdf , ps , other ]
-
Title: Understanding Learning through the Lens of Dynamical InvariantsAuthors: Alex UshveridzeComments: 19 pagesSubjects: Artificial Intelligence (cs.AI) ; Information Theory (cs.IT)
Abstract: This paper proposes a novel perspective on learning, positing it as the pursuit of dynamical invariants -- data combinations that remain constant or exhibit minimal change over time as a system evolves. This concept is underpinned by both informational and physical principles, rooted in the inherent properties of these invariants. Firstly, their stability makes them ideal for memorization and integration into associative networks, forming the basis of our knowledge structures. Secondly, the predictability of these stable invariants makes them valuable sources of usable energy, quantifiable as kTln2 per bit of accurately predicted information. This energy can be harnessed to explore new transformations, rendering learning systems energetically autonomous and increasingly effective. Such systems are driven to continuously seek new data invariants as energy sources. The paper further explores several meta-architectures of autonomous, self-propelled learning agents that utilize predictable information patterns as a source of usable energy.
- [127] arXiv:2401.10431 [ pdf , other ]
-
Title: Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial ProblemsAuthors: Tristan CazenaveSubjects: Artificial Intelligence (cs.AI)
Abstract: Monte Carlo Search gives excellent results in multiple difficult combinatorial problems. Using a prior to perform non uniform playouts during the search improves a lot the results compared to uniform playouts. Handmade heuristics tailored to the combinatorial problem are often used as priors. We propose a method to automatically compute a prior. It uses statistics on solved problems. It is a simple and general method that incurs no computational cost at playout time and that brings large performance gains. The method is applied to three difficult combinatorial problems: Latin Square Completion, Kakuro, and Inverse RNA Folding.
- [128] arXiv:2401.10444 [ pdf , ps , other ]
-
Title: Can A Cognitive Architecture Fundamentally Enhance LLMs? Or Vice Versa?Authors: Ron SunSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: The paper discusses what is needed to address the limitations of current LLM-centered AI systems. The paper argues that incorporating insights from human cognition and psychology, as embodied by a computational cognitive architecture, can help develop systems that are more capable, more reliable, and more human-like. It emphasizes the importance of the dual-process architecture and the hybrid neuro-symbolic approach in addressing the limitations of current LLMs. In the opposite direction, the paper also highlights the need for an overhaul of computational cognitive architectures to better reflect advances in AI and computing technology. Overall, the paper advocates for a multidisciplinary, mutually beneficial approach towards developing better models both for AI and for understanding the human mind.
- [129] arXiv:2401.10467 [ pdf , other ]
-
Title: Learning Backdoors for Mixed Integer Programs with Contrastive LearningSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract: Many real-world problems can be efficiently modeled as Mixed Integer Programs (MIPs) and solved with the Branch-and-Bound method. Prior work has shown the existence of MIP backdoors, small sets of variables such that prioritizing branching on them when possible leads to faster running times. However, finding high-quality backdoors that improve running times remains an open question. Previous work learns to estimate the relative solver speed of randomly sampled backdoors through ranking and then decide whether to use it. In this paper, we utilize the Monte-Carlo tree search method to collect backdoors for training, rather than relying on random sampling, and adapt a contrastive learning framework to train a Graph Attention Network model to predict backdoors. Our method, evaluated on four common MIP problem domains, demonstrates performance improvements over both Gurobi and previous models.
- [130] arXiv:2401.10568 [ pdf , other ]
-
Title: CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making AgentsAuthors: Siyuan Qi , Shuo Chen , Yexin Li , Xiangyu Kong , Junqi Wang , Bangcheng Yang , Pring Wong , Yifan Zhong , Xiaoyuan Zhang , Zhaowei Zhang , Nian Liu , Wei Wang , Yaodong Yang , Song-Chun ZhuSubjects: Artificial Intelligence (cs.AI)
Abstract: The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at this https URL .
- [131] arXiv:2401.10589 [ pdf , other ]
-
Title: Rethinking the Soft Conflict Pseudo Boolean Constraint on MaxSAT Local Search SolversSubjects: Artificial Intelligence (cs.AI)
Abstract: MaxSAT is an optimization version of the famous NP-complete Satisfiability problem (SAT). Algorithms for MaxSAT mainly include complete solvers and local search incomplete solvers. In many complete solvers, once a better solution is found, a Soft conflict Pseudo Boolean (SPB) constraint will be generated to enforce the algorithm to find better solutions. In many local search algorithms, clause weighting is a key technique for effectively guiding the search directions. In this paper, we propose to transfer the SPB constraint into the clause weighting system of the local search method, leading the algorithm to better solutions. We further propose an adaptive clause weighting strategy that breaks the tradition of using constant values to adjust clause weights. Based on the above methods, we propose a new local search algorithm called SPB-MaxSAT that provides new perspectives for clause weighting on MaxSAT local search solvers. Extensive experiments demonstrate the excellent performance of the proposed methods.
- [132] arXiv:2401.10744 [ pdf , other ]
-
Title: FinLLMs: A Framework for Financial Reasoning Dataset Generation with Large Language ModelsComments: Under submission of IEEE TransactionsSubjects: Artificial Intelligence (cs.AI)
Abstract: Large Language models (LLMs) usually rely on extensive training datasets. In the financial domain, creating numerical reasoning datasets that include a mix of tables and long text often involves substantial manual annotation expenses. To address the limited data resources and reduce the annotation cost, we introduce FinLLMs, a method for generating financial question-answering data based on common financial formulas using Large Language Models. First, we compile a list of common financial formulas and construct a graph based on the variables these formulas employ. We then augment the formula set by combining those that share identical variables as new elements. Specifically, we explore formulas obtained by manual annotation and merge those formulas with shared variables by traversing the constructed graph. Finally, utilizing GPT-3.5, we generate financial question-answering data that encompasses both tabular information and long textual content, building on the collected formula set. Our experiments demonstrate that synthetic data generated by FinLLMs effectively enhances the performance of several large-scale numerical reasoning models in the financial domain, outperforming two established benchmark financial question-answering datasets.
- [133] arXiv:2401.10751 [ pdf , other ]
-
Title: EFO: the Emotion Frame OntologySubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Symbolic Computation (cs.SC)
Abstract: Emotions are a subject of intense debate in various disciplines. Despite the proliferation of theories and definitions, there is still no consensus on what emotions are, and how to model the different concepts involved when we talk about - or categorize - them. In this paper, we propose an OWL frame-based ontology of emotions: the Emotion Frames Ontology (EFO). EFO treats emotions as semantic frames, with a set of semantic roles that capture the different aspects of emotional experience. EFO follows pattern-based ontology design, and is aligned to the DOLCE foundational ontology. EFO is used to model multiple emotion theories, which can be cross-linked as modules in an Emotion Ontology Network. In this paper, we exemplify it by modeling Ekman's Basic Emotions (BE) Theory as an EFO-BE module, and demonstrate how to perform automated inferences on the representation of emotion situations. EFO-BE has been evaluated by lexicalizing the BE emotion frames from within the Framester knowledge graph, and implementing a graph-based emotion detector from text. In addition, an EFO integration of multimodal datasets, including emotional speech and emotional face expressions, has been performed to enable further inquiry into crossmodal emotion semantics.
- [134] arXiv:2401.10781 [ pdf , other ]
-
Title: Metric Dynamic Equilibrium LogicAuthors: Arvid Becker , Pedro Cabalar , Martín Diéguez , Luis Fariñas , Torsten Schaub , Anna SchuhmannComments: arXiv admin note: text overlap with arXiv:2304.14778Subjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: In temporal extensions of Answer Set Programming (ASP) based on linear-time, the behavior of dynamic systems is captured by sequences of states. While this representation reflects their relative order, it abstracts away the specific times associated with each state. In many applications, however, timing constraints are important like, for instance, when planning and scheduling go hand in hand. We address this by developing a metric extension of linear-time Dynamic Equilibrium Logic, in which dynamic operators are constrained by intervals over integers. The resulting Metric Dynamic Equilibrium Logic provides the foundation of an ASP-based approach for specifying qualitative and quantitative dynamic constraints. As such, it constitutes the most general among a whole spectrum of temporal extensions of Equilibrium Logic. In detail, we show that it encompasses Temporal, Dynamic, Metric, and regular Equilibrium Logic, as well as its classic counterparts once the law of the excluded middle is added.
- [135] arXiv:2401.10819 [ pdf , other ]
-
Title: Optimisation in Neurosymbolic Learning SystemsAuthors: Emile van KriekenComments: PhD dissertationSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Neurosymbolic AI aims to integrate deep learning with symbolic AI. This integration has many promises, such as decreasing the amount of data required to train a neural network, improving the explainability and interpretability of answers given by models and verifying the correctness of trained systems. We study neurosymbolic learning, where we have both data and background knowledge expressed using symbolic languages. How do we connect the symbolic and neural components to communicate this knowledge? One option is fuzzy reasoning, which studies degrees of truth. For example, being tall is not a binary concept. Instead, probabilistic reasoning studies the probability that something is true or will happen. Our first research question studies how different forms of fuzzy reasoning combine with learning. We find surprising results like a connection to the Raven paradox stating we confirm "ravens are black" when we observe a green apple. In this study, we did not use the background knowledge when we deployed our models after training. In our second research question, we studied how to use background knowledge in deployed models. We developed a new neural network layer based on fuzzy reasoning. Probabilistic reasoning is a natural fit for neural networks, which we usually train to be probabilistic. However, they are expensive to compute and do not scale well to large tasks. In our third research question, we study how to connect probabilistic reasoning with neural networks by sampling to estimate averages, while in the final research question, we study scaling probabilistic neurosymbolic learning to much larger problems than before. Our insight is to train a neural network with synthetic data to predict the result of probabilistic reasoning.
- [136] arXiv:2401.10904 [ pdf , ps , other ]
-
Title: A Review of Findings from Neuroscience and Cognitive Psychology as Possible Inspiration for the Path to Artificial General IntelligenceAuthors: Florin LeonComments: 143 pages, 49 figures, 244 referencesSubjects: Artificial Intelligence (cs.AI)
Abstract: This review aims to contribute to the quest for artificial general intelligence by examining neuroscience and cognitive psychology methods for potential inspiration. Despite the impressive advancements achieved by deep learning models in various domains, they still have shortcomings in abstract reasoning and causal understanding. Such capabilities should be ultimately integrated into artificial intelligence systems in order to surpass data-driven limitations and support decision making in a way more similar to human intelligence. This work is a vertical review that attempts a wide-ranging exploration of brain function, spanning from lower-level biological neurons, spiking neural networks, and neuronal ensembles to higher-level concepts such as brain anatomy, vector symbolic architectures, cognitive and categorization models, and cognitive architectures. The hope is that these concepts may offer insights for solutions in artificial general intelligence.
- [137] arXiv:2401.10938 [ pdf , ps , other ]
-
Title: Even-if Explanations: Formal Foundations, Priorities and ComplexityAuthors: Gianvincenzo Alfano , Sergio Greco , Domenico Mandaglio , Francesco Parisi , Reza Shahbazian , Irina TrubitsynaComments: arXiv admin note: text overlap with arXiv:2010.12265 by other authorsSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: EXplainable AI has received significant attention in recent years. Machine learning models often operate as black boxes, lacking explainability and transparency while supporting decision-making processes. Local post-hoc explainability queries attempt to answer why individual inputs are classified in a certain way by a given model. While there has been important work on counterfactual explanations, less attention has been devoted to semifactual ones. In this paper, we focus on local post-hoc explainability queries within the semifactual `even-if' thinking and their computational complexity among different classes of models, and show that both linear and tree-based models are strictly more interpretable than neural networks. After this, we introduce a preference-based framework that enables users to personalize explanations based on their preferences, both in the case of semifactuals and counterfactuals, enhancing interpretability and user-centricity. Finally, we explore the complexity of several interpretability problems in the proposed preference-based framework and provide algorithms for polynomial cases.
- [138] arXiv:2401.10969 [ pdf , other ]
-
Title: MacroSwarm: A Field-based Compositional Framework for Swarm ProgrammingSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
Abstract: Swarm behaviour engineering is an area of research that seeks to investigate methods and techniques for coordinating computation and action within groups of simple agents to achieve complex global goals like pattern formation, collective movement, clustering, and distributed sensing. Despite recent progress in the analysis and engineering of swarms (of drones, robots, vehicles), there is still a need for general design and implementation methods and tools that can be used to define complex swarm behaviour in a principled way. To contribute to this quest, this article proposes a new field-based coordination approach, called MacroSwarm, to design and program swarm behaviour in terms of reusable and fully composable functional blocks embedding collective computation and coordination. Based on the macroprogramming paradigm of aggregate computing, MacroSwarm builds on the idea of expressing each swarm behaviour block as a pure function mapping sensing fields into actuation goal fields, e.g. including movement vectors. In order to demonstrate the expressiveness, compositionality, and practicality of MacroSwarm as a framework for collective intelligence, we perform a variety of simulations covering common patterns of flocking, morphogenesis, and collective decision-making.
- [139] arXiv:2401.11094 [ pdf , other ]
-
Title: TypeDance: Creating Semantic Typographic Logos from Image through Personalized GenerationComments: 24 pages, 9 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Semantic typographic logos harmoniously blend typeface and imagery to represent semantic concepts while maintaining legibility. Conventional methods using spatial composition and shape substitution are hindered by the conflicting requirement for achieving seamless spatial fusion between geometrically dissimilar typefaces and semantics. While recent advances made AI generation of semantic typography possible, the end-to-end approaches exclude designer involvement and disregard personalized design. This paper presents TypeDance, an AI-assisted tool incorporating design rationales with the generative model for personalized semantic typographic logo design. It leverages combinable design priors extracted from uploaded image exemplars and supports type-imagery mapping at various structural granularity, achieving diverse aesthetic designs with flexible control. Additionally, we instantiate a comprehensive design workflow in TypeDance, including ideation, selection, generation, evaluation, and iteration. A two-task user evaluation, including imitation and creation, confirmed the usability of TypeDance in design across different usage scenarios
- [140] arXiv:2401.11472 [ pdf , other ]
-
Title: Abstract Weighted Based Gradual Semantics in Argumentation TheorySubjects: Artificial Intelligence (cs.AI)
Abstract: Weighted gradual semantics provide an acceptability degree to each argument representing the strength of the argument, computed based on factors including background evidence for the argument, and taking into account interactions between this argument and others. We introduce four important problems linking gradual semantics and acceptability degrees. First, we reexamine the inverse problem, seeking to identify the argument weights of the argumentation framework which lead to a specific final acceptability degree. Second, we ask whether the function mapping between argument weights and acceptability degrees is injective or a homeomorphism onto its image. Third, we ask whether argument weights can be found when preferences, rather than acceptability degrees for arguments are considered. Fourth, we consider the topology of the space of valid acceptability degrees, asking whether gaps exist in this space. While different gradual semantics have been proposed in the literature, in this paper, we identify a large family of weighted gradual semantics, called abstract weighted based gradual semantics. These generalise many of the existing semantics while maintaining desirable properties such as convergence to a unique fixed point. We also show that a sub-family of the weighted gradual semantics, called abstract weighted (Lp,lambda,mu,A)-based gradual semantics and which include well-known semantics, solve all four of the aforementioned problems.
- [141] arXiv:2401.11553 [ pdf , ps , other ]
-
Title: Taxi dispatching strategies with compensationsJournal-ref: Expert Systems with Applications, Volume 122 (2019)Subjects: Artificial Intelligence (cs.AI)
Abstract: Urban mobility efficiency is of utmost importance in big cities. Taxi vehicles are key elements in daily traffic activity. The advance of ICT and geo-positioning systems has given rise to new opportunities for improving the efficiency of taxi fleets in terms of waiting times of passengers, cost and time for drivers, traffic density, CO2 emissions, etc., by using more informed, intelligent dispatching. Still, the explicit spatial and temporal components, as well as the scale and, in particular, the dynamicity of the problem of pairing passengers and taxis in big towns, render traditional approaches for solving standard assignment problem useless for this purpose, and call for intelligent approximation strategies based on domain-specific heuristics. Furthermore, taxi drivers are often autonomous actors and may not agree to participate in assignments that, though globally efficient, may not be sufficently beneficial for them individually. This paper presents a new heuristic algorithm for taxi assignment to customers that considers taxi reassignments if this may lead to globally better solutions. In addition, as such new assignments may reduce the expected revenues of individual drivers, we propose an economic compensation scheme to make individually rational drivers agree to proposed modifications in their assigned clients. We carried out a set of experiments, where several commonly used assignment strategies are compared to three different instantiations of our heuristic algorithm. The results indicate that our proposal has the potential to reduce customer waiting times in fleets of autonomous taxis, while being also beneficial from an economic point of view.
- [142] arXiv:2401.11753 [ pdf , ps , other ]
-
Title: From Knowledge Organization to Knowledge Representation and BackComments: Accepted @ Annals of Library and Information Studies (ALIS) Journal - Ranganathan Commemorative Issue (2024)Subjects: Artificial Intelligence (cs.AI) ; Digital Libraries (cs.DL)
Abstract: Knowledge Organization (KO) and Knowledge Representation (KR) have been the two mainstream methodologies of knowledge modelling in the Information Science community and the Artificial Intelligence community, respectively. The facet-analytical tradition of KO has developed an exhaustive set of guiding canons for ensuring quality in organising and managing knowledge but has remained limited in terms of technology-driven activities to expand its scope and services beyond the bibliographic universe of knowledge. KR, on the other hand, boasts of a robust ecosystem of technologies and technology-driven service design which can be tailored to model any entity or scale to any service in the entire universe of knowledge. This paper elucidates both the facet-analytical KO and KR methodologies in detail and provides a functional mapping between them. Out of the mapping, the paper proposes an integrated KR-enriched KO methodology with all the standard components of a KO methodology plus the advanced technologies provided by the KR approach. The practical benefits of the methodological integration has been exemplified through the flagship application of the Digital University at the University of Trento, Italy.
- [143] arXiv:2401.11848 [ pdf , other ]
-
Title: ExtruOnt: An ontology for describing a type of manufacturing machine for Industry 4.0 systemsComments: This is the accepted manuscript. The definitive, peer reviewed and edited version of this article is published in Semantic Web 11(6): 887-909 (2020) this https URLJournal-ref: Semantic Web 11(6): 887-909 (2020)Subjects: Artificial Intelligence (cs.AI)
Abstract: Semantically rich descriptions of manufacturing machines, offered in a machine-interpretable code, can provide interesting benefits in Industry 4.0 scenarios. However, the lack of that type of descriptions is evident. In this paper we present the development effort made to build an ontology, called ExtruOnt, for describing a type of manufacturing machine, more precisely, a type that performs an extrusion process (extruder). Although the scope of the ontology is restricted to a concrete domain, it could be used as a model for the development of other ontologies for describing manufacturing machines in Industry 4.0 scenarios. The terms of the ExtruOnt ontology provide different types of information related with an extruder, which are reflected in distinct modules that constitute the ontology. Thus, it contains classes and properties for expressing descriptions about components of an extruder, spatial connections, features, and 3D representations of those components, and finally the sensors used to capture indicators about the performance of this type of machine. The ontology development process has been carried out in close collaboration with domain experts.
- [144] arXiv:2401.11865 [ pdf , ps , other ]
-
Title: Toward Semantic Interoperability of Electronic Health RecordsComments: This is the Accepted Manuscript. The definitive, peer reviewed and edited version of this article is: Idoia Berges, Jes\'us Berm\'udez, Arantza Illarramendi: Toward Semantic Interoperability of Electronic Health Records. IEEE Trans. Inf. Technol. Biomed. 16(3): 424-431 (2012). DOI:10.1109/TITB.2011.2180917. Copyright 2011 IEEEJournal-ref: IEEE Trans. Inf. Technol. Biomed. 16(3): 424-431 (2012)Subjects: Artificial Intelligence (cs.AI)
Abstract: Although the goal of achieving semantic interoperability of electronic health records (EHRs) is pursued by many researchers, it has not been accomplished yet. In this paper, we present a proposal that smoothes out the way toward the achievement of that goal. In particular, our study focuses on medical diagnoses statements. In summary, the main contributions of our ontology-based proposal are the following: first, it includes a canonical ontology whose EHR-related terms focus on semantic aspects. As a result, their descriptions are independent of languages and technology aspects used in different organizations to represent EHRs. Moreover, those terms are related to their corresponding codes in well-known medical terminologies. Second, it deals with modules that allow obtaining rich ontological representations of EHR information managed by proprietary models of health information systems. The features of one specific module are shown as reference. Third, it considers the necessary mapping axioms between ontological terms enhanced with so-called path mappings. This feature smoothes out structural differences between heterogeneous EHR representations, allowing proper alignment of information.
- [145] arXiv:2401.11898 [ pdf , other ]
-
Title: Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint SolvingAuthors: Salwa Tabet Gonzalez (University of Strasbourg), Predrag Janičić (University of Belgrade), Julien Narboux (University of Strasbourg)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 21-37Subjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: Conjecturing and theorem proving are activities at the center of mathematical practice and are difficult to separate. In this paper, we propose a framework for completing incomplete conjectures and incomplete proofs. The framework can turn a conjecture with missing assumptions and with an under-specified goal into a proper theorem. Also, the proposed framework can help in completing a proof sketch into a human-readable and machine-checkable proof. Our approach is focused on synthetic geometry, and uses coherent logic and constraint solving. The proposed approach is uniform for all three kinds of tasks, flexible and, to our knowledge, unique such approach.
- [146] arXiv:2401.11903 [ pdf , other ]
-
Title: Automation of Triangle Ruler-and-Compass Constructions Using Constraint SolversAuthors: Milan Banković (Faculty of Mathematics, University of Belgrade, Serbia)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 62-72Subjects: Artificial Intelligence (cs.AI)
Abstract: In this paper, we present an approach to automated solving of triangle ruler-and-compass construction problems using finite-domain constraint solvers. The constraint model is described in the MiniZinc modeling language, and is based on the automated planning. The main benefit of using general constraint solvers for such purpose, instead of developing dedicated tools, is that we can rely on the efficient search that is already implemented within the solver, enabling us to focus on geometric aspects of the problem. We may also use the solver's built-in optimization capabilities to search for the shortest possible constructions. We evaluate our approach on 74 solvable problems from the Wernick's list, and compare it to the dedicated triangle construction solver ArgoTriCS. The results show that our approach is comparable to dedicated tools, while it requires much less effort to implement. Also, our model often finds shorter constructions, thanks to the optimization capabilities offered by the constraint solvers.
- [147] arXiv:2401.11905 [ pdf , other ]
-
Title: Considerations on Approaches and Metrics in Automated Theorem Generation/Finding in GeometryAuthors: Pedro Quaresma (University of Coimbra), Pierluigi Graziani (University of Urbino), Stefano M. Nicoletti (University of Twente)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 85-100Subjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO)
Abstract: The pursue of what are properties that can be identified to permit an automated reasoning program to generate and find new and interesting theorems is an interesting research goal (pun intended). The automatic discovery of new theorems is a goal in itself, and it has been addressed in specific areas, with different methods. The separation of the "weeds", uninteresting, trivial facts, from the "wheat", new and interesting facts, is much harder, but is also being addressed by different authors using different approaches. In this paper we will focus on geometry. We present and discuss different approaches for the automatic discovery of geometric theorems (and properties), and different metrics to find the interesting theorems among all those that were generated. After this description we will introduce the first result of this article: an undecidability result proving that having an algorithmic procedure that decides for every possible Turing Machine that produces theorems, whether it is able to produce also interesting theorems, is an undecidable problem. Consequently, we will argue that judging whether a theorem prover is able to produce interesting theorems remains a non deterministic task, at best a task to be addressed by program based in an algorithm guided by heuristics criteria. Therefore, as a human, to satisfy this task two things are necessary: an expert survey that sheds light on what a theorem prover/finder of interesting geometric theorems is, and - to enable this analysis - other surveys that clarify metrics and approaches related to the interestingness of geometric theorems. In the conclusion of this article we will introduce the structure of two of these surveys - the second result of this article - and we will discuss some future work.
- [148] arXiv:2401.12108 [ pdf , other ]
-
Title: On-Time Delivery in Crowdshipping Systems: An Agent-Based Approach Using Streaming DataJournal-ref: Frontiers in Artificial Intelligence and Applications. Volume 325: ECAI 2020. Pages 51-58Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: In parcel delivery, the "last mile" from the parcel hub to the customer is costly, especially for time-sensitive delivery tasks that have to be completed within hours after arrival. Recently, crowdshipping has attracted increased attention as a new alternative to traditional delivery modes. In crowdshipping, private citizens ("the crowd") perform short detours in their daily lives to contribute to parcel delivery in exchange for small incentives. However, achieving desirable crowd behavior is challenging as the crowd is highly dynamic and consists of autonomous, self-interested individuals. Leveraging crowdshipping for time-sensitive deliveries remains an open challenge. In this paper, we present an agent-based approach to on-time parcel delivery with crowds. Our system performs data stream processing on the couriers' smartphone sensor data to predict delivery delays. Whenever a delay is predicted, the system attempts to forge an agreement for transferring the parcel from the current deliverer to a more promising courier nearby. Our experiments show that through accurate delay predictions and purposeful task transfers many delays can be prevented that would occur without our approach.
- [149] arXiv:2401.12247 [ pdf , ps , other ]
-
Title: Exploring consumers response to text-based chatbots in e-commerce: The moderating role of task complexity and chatbot disclosureComments: Internet Research (2021)Subjects: Artificial Intelligence (cs.AI)
Abstract: Artificial intelligence based chatbots have brought unprecedented business potential. This study aims to explore consumers trust and response to a text-based chatbot in ecommerce, involving the moderating effects of task complexity and chatbot identity disclosure. A survey method with 299 useable responses was conducted in this research. This study adopted the ordinary least squares regression to test the hypotheses. First, the consumers perception of both the empathy and friendliness of the chatbot positively impacts their trust in it. Second, task complexity negatively moderates the relationship between friendliness and consumers trust. Third, disclosure of the text based chatbot negatively moderates the relationship between empathy and consumers trust, while it positively moderates the relationship between friendliness and consumers trust. Fourth, consumers trust in the chatbot increases their reliance on the chatbot and decreases their resistance to the chatbot in future interactions. Adopting the stimulus organism response framework, this study provides important insights on consumers perception and response to the text-based chatbot. The findings of this research also make suggestions that can increase consumers positive responses to text based chatbots. Extant studies have investigated the effects of automated bots attributes on consumers perceptions. However, the boundary conditions of these effects are largely ignored. This research is one of the first attempts to provide a deep understanding of consumers responses to a chatbot.
- [150] arXiv:2401.12322 [ pdf , ps , other ]
-
Title: Smart Recommendations for Renting Bikes in Bike Sharing SystemsJournal-ref: Applied Sciences, Volume 11, Issue 20 (2021)Subjects: Artificial Intelligence (cs.AI)
Abstract: Vehicle-sharing systems -- such as bike-, car-, or motorcycle-sharing systems -- have become increasingly popular in big cities in recent years. On the one hand, they provide a cheaper and environmentally friendlier means of transportation than private cars, and on the other hand, they satisfy the individual mobility demands of citizens better than traditional public transport systems. One of their advantages in this regard is their availability, e.g., the possibility of taking (or leaving) a vehicle almost anywhere in a city. This availability obviously depends on different strategic and operational management decisions and policies, such as the dimension of the fleet or the (re)distribution of vehicles. Agglutination problems -- where, due to usage patterns, available vehicles are concentrated in certain areas, whereas no vehicles are available in others -- are quite common in such systems, and need to be dealt with. Research has been dedicated to this problem, specifying different techniques to reduce imbalanced situations. In this paper, we present and compare strategies for recommending stations to users who wish to rent or return bikes in station-based bike-sharing systems. Our first contribution is a novel recommendation strategy based on queuing theory that recommends stations based on their utility to the user in terms of lower distance and higher probability of finding a bike or slot. Then, we go one step further, defining a strategy that recommends stations by combining the utility of a particular user with the utility of the global system, measured in terms of the improvement in the distribution of bikes and slots with respect to the expected future demand, with the aim of implicitly avoiding or alleviating balancing problems. We present several experiments to evaluate our proposal with real data from the bike sharing system BiciMAD in Madrid.
- [151] arXiv:2401.12324 [ pdf , ps , other ]
-
Title: Streamlining Advanced Taxi Assignment Strategies based on Legal AnalysisAuthors: Holger Billhardt , José-Antonio Santos , Alberto Fernández , Mar Moreno , Sascha Ossowski , José A. RodríguezJournal-ref: Neurocomputing, Volume 438 (2022)Subjects: Artificial Intelligence (cs.AI)
Abstract: In recent years many novel applications have appeared that promote the provision of services and activities in a collaborative manner. The key idea behind such systems is to take advantage of idle or underused capacities of existing resources, in order to provide improved services that assist people in their daily tasks, with additional functionality, enhanced efficiency, and/or reduced cost. Particularly in the domain of urban transportation, many researchers have put forward novel ideas, which are then implemented and evaluated through prototypes that usually draw upon AI methods and tools. However, such proposals also bring up multiple non-technical issues that need to be identified and addressed adequately if such systems are ever meant to be applied to the real world. While, in practice, legal and ethical aspects related to such AI-based systems are seldomly considered in the beginning of the research and development process, we argue that they not only restrict design decisions, but can also help guiding them. In this manuscript, we set out from a prototype of a taxi coordination service that mediates between individual (and autonomous) taxis and potential customers. After representing key aspects of its operation in a semi-structured manner, we analyse its viability from the viewpoint of current legal restrictions and constraints, so as to identify additional non-functional requirements as well as options to address them. Then, we go one step ahead, and actually modify the existing prototype to incorporate the previously identified recommendations. Performing experiments with this improved system helps us identify the most adequate option among several legally admissible alternatives.
- [152] arXiv:2401.12379 [ pdf , other ]
-
Title: Analyzing the Effectiveness of Large Language Models on Text-to-SQL SynthesisSubjects: Artificial Intelligence (cs.AI) ; Databases (cs.DB); Programming Languages (cs.PL)
Abstract: This study investigates various approaches to using Large Language Models (LLMs) for Text-to-SQL program synthesis, focusing on the outcomes and insights derived. Employing the popular Text-to-SQL dataset, spider, the goal was to input a natural language question along with the database schema and output the correct SQL SELECT query. The initial approach was to fine-tune a local and open-source model to generate the SELECT query. After QLoRa fine-tuning WizardLM's WizardCoder-15B model on the spider dataset, the execution accuracy for generated queries rose to a high of 61%. With the second approach, using the fine-tuned gpt-3.5-turbo-16k (Few-shot) + gpt-4-turbo (Zero-shot error correction), the execution accuracy reached a high of 82.1%. Of all the incorrect queries, most can be categorized into a seven different categories of what went wrong: selecting the wrong columns or wrong order of columns, grouping by the wrong column, predicting the wrong values in conditionals, using different aggregates than the ground truth, extra or too few JOIN clauses, inconsistencies in the Spider dataset, and lastly completely incorrect query structure. Most if not all of the queries fall into these categories and it is insightful to understanding where the faults still lie with LLM program synthesis and where they can be improved.
- [153] arXiv:2401.12435 [ pdf , ps , other ]
-
Title: Quantitative Analysis of Molecular Transport in the Extracellular Space Using Physics-Informed Neural NetworkAuthors: Jiayi Xie , Hongfeng Li , Jin Cheng , Qingrui Cai , Hanbo Tan , Lingyun Zu , Xiaobo Qu , Hongbin HanSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Analysis of PDEs (math.AP)
Abstract: The brain extracellular space (ECS), an irregular, extremely tortuous nanoscale space located between cells or between cells and blood vessels, is crucial for nerve cell survival. It plays a pivotal role in high-level brain functions such as memory, emotion, and sensation. However, the specific form of molecular transport within the ECS remain elusive. To address this challenge, this paper proposes a novel approach to quantitatively analyze the molecular transport within the ECS by solving an inverse problem derived from the advection-diffusion equation (ADE) using a physics-informed neural network (PINN). PINN provides a streamlined solution to the ADE without the need for intricate mathematical formulations or grid settings. Additionally, the optimization of PINN facilitates the automatic computation of the diffusion coefficient governing long-term molecule transport and the velocity of molecules driven by advection. Consequently, the proposed method allows for the quantitative analysis and identification of the specific pattern of molecular transport within the ECS through the calculation of the Peclet number. Experimental validation on two datasets of magnetic resonance images (MRIs) captured at different time points showcases the effectiveness of the proposed method. Notably, our simulations reveal identical molecular transport patterns between datasets representing rats with tracer injected into the same brain region. These findings highlight the potential of PINN as a promising tool for comprehensively exploring molecular transport within the ECS.
- [154] arXiv:2401.12459 [ pdf , other ]
-
Title: Towards Socially and Morally Aware RL agent: Reward Design With LLMAuthors: Zhaoyue WangSubjects: Artificial Intelligence (cs.AI)
Abstract: When we design and deploy an Reinforcement Learning (RL) agent, reward functions motivates agents to achieve an objective. An incorrect or incomplete specification of the objective can result in behavior that does not align with human values - failing to adhere with social and moral norms that are ambiguous and context dependent, and cause undesired outcomes such as negative side effects and exploration that is unsafe. Previous work have manually defined reward functions to avoid negative side effects, use human oversight for safe exploration, or use foundation models as planning tools. This work studies the ability of leveraging Large Language Models (LLM)' understanding of morality and social norms on safe exploration augmented RL methods. This work evaluates language model's result against human feedbacks and demonstrates language model's capability as direct reward signals.
- [155] arXiv:2401.12467 [ pdf , other ]
-
Title: An open dataset for the evolution of oracle bone characters: EVOBCAuthors: Haisu Guan , Jinpeng Wan , Yuliang Liu , Pengjie Wang , Kaile Zhang , Zhebin Kuang , Xinyu Wang , Xiang Bai , Lianwen JinSubjects: Artificial Intelligence (cs.AI)
Abstract: The earliest extant Chinese characters originate from oracle bone inscriptions, which are closely related to other East Asian languages. These inscriptions hold immense value for anthropology and archaeology. However, deciphering oracle bone script remains a formidable challenge, with only approximately 1,600 of the over 4,500 extant characters elucidated to date. Further scholarly investigation is required to comprehensively understand this ancient writing system. Artificial Intelligence technology is a promising avenue for deciphering oracle bone characters, particularly concerning their evolution. However, one of the challenges is the lack of datasets mapping the evolution of these characters over time. In this study, we systematically collected ancient characters from authoritative texts and websites spanning six historical stages: Oracle Bone Characters - OBC (15th century B.C.), Bronze Inscriptions - BI (13th to 221 B.C.), Seal Script - SS (11th to 8th centuries B.C.), Spring and Autumn period Characters - SAC (770 to 476 B.C.), Warring States period Characters - WSC (475 B.C. to 221 B.C.), and Clerical Script - CS (221 B.C. to 220 A.D.). Subsequently, we constructed an extensive dataset, namely EVolution Oracle Bone Characters (EVOBC), consisting of 229,170 images representing 13,714 distinct character categories. We conducted validation and simulated deciphering on the constructed dataset, and the results demonstrate its high efficacy in aiding the study of oracle bone script. This openly accessible dataset aims to digitalize ancient Chinese scripts across multiple eras, facilitating the decipherment of oracle bone script by examining the evolution of glyph forms.
- [156] arXiv:2401.12497 [ pdf , other ]
-
Title: Building Minimal and Reusable Causal State Abstractions for Reinforcement LearningComments: Accepted at AAAI24Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on manipulation environments and Deepmind Control Suite reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.
- [157] arXiv:2401.12550 [ pdf , other ]
-
Title: UR4NNV: Neural Network Verification, Under-approximation Reachability Works!Authors: Zhen Liang , Taoran Wu , Ran Zhao , Bai Xue , Ji Wang , Wenjing Yang , Shaojun Deng , Wanwei LiuComments: 11 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Recently, formal verification of deep neural networks (DNNs) has garnered considerable attention, and over-approximation based methods have become popular due to their effectiveness and efficiency. However, these strategies face challenges in addressing the "unknown dilemma" concerning whether the exact output region or the introduced approximation error violates the property in question. To address this, this paper introduces the UR4NNV verification framework, which utilizes under-approximation reachability analysis for DNN verification for the first time. UR4NNV focuses on DNNs with Rectified Linear Unit (ReLU) activations and employs a binary tree branch-based under-approximation algorithm. In each epoch, UR4NNV under-approximates a sub-polytope of the reachable set and verifies this polytope against the given property. Through a trial-and-error approach, UR4NNV effectively falsifies DNN properties while providing confidence levels when reaching verification epoch bounds and failing falsifying properties. Experimental comparisons with existing verification methods demonstrate the effectiveness and efficiency of UR4NNV, significantly reducing the impact of the "unknown dilemma".
- [158] arXiv:2401.12557 [ pdf , ps , other ]
-
Title: Balancing the AI Strength of Roles in Self-Play Training with Regret Matching+Authors: Xiaoxi WangSubjects: Artificial Intelligence (cs.AI)
Abstract: When training artificial intelligence for games encompassing multiple roles, the development of a generalized model capable of controlling any character within the game presents a viable option. This strategy not only conserves computational resources and time during the training phase but also reduces resource requirements during deployment. training such a generalized model often encounters challenges related to uneven capabilities when controlling different roles. A simple method is introduced based on Regret Matching+, which facilitates a more balanced performance of strength by the model when controlling various roles.
- [159] arXiv:2401.12599 [ pdf , other ]
-
Title: Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure RecognitionAuthors: Demiao Lin (chatdoc.com)Comments: 18 pages, 16 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: With the rapid development of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) has become a predominant method in the field of professional knowledge-based question answering. Presently, major foundation model companies have opened up Embedding and Chat API interfaces, and frameworks like LangChain have already integrated the RAG process. It appears that the key models and steps in RAG have been resolved, leading to the question: are professional knowledge QA systems now approaching perfection? This article discovers that current primary methods depend on the premise of accessing high-quality text corpora. However, since professional documents are mainly stored in PDFs, the low accuracy of PDF parsing significantly impacts the effectiveness of professional knowledge-based QA. We conducted an empirical RAG experiment across hundreds of questions from the corresponding real-world professional documents. The results show that, ChatDOC, a RAG system equipped with a panoptic and pinpoint PDF parser, retrieves more accurate and complete segments, and thus better answers. Empirical experiments show that ChatDOC is superior to baseline on nearly 47% of questions, ties for 38% of cases, and falls short on only 15% of cases. It shows that we may revolutionize RAG with enhanced PDF structure recognition.
- [160] arXiv:2401.12624 [ pdf , other ]
-
Title: Knowledge Distillation from Language-Oriented to Emergent Communication for Multi-Agent Remote ControlSubjects: Artificial Intelligence (cs.AI) ; Information Theory (cs.IT); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract: In this work, we compare emergent communication (EC) built upon multi-agent deep reinforcement learning (MADRL) and language-oriented semantic communication (LSC) empowered by a pre-trained large language model (LLM) using human language. In a multi-agent remote navigation task, with multimodal input data comprising location and channel maps, it is shown that EC incurs high training cost and struggles when using multimodal data, whereas LSC yields high inference computing cost due to the LLM's large size. To address their respective bottlenecks, we propose a novel framework of language-guided EC (LEC) by guiding the EC training using LSC via knowledge distillation (KD). Simulations corroborate that LEC achieves faster travel time while avoiding areas with poor channel conditions, as well as speeding up the MADRL training convergence by up to 61.8% compared to EC.
- [161] arXiv:2401.12666 [ pdf , other ]
-
Title: EL-VIT: Probing Vision Transformer with Interactive VisualizationComments: 10 pages, 7 figures, conferenceSubjects: Artificial Intelligence (cs.AI)
Abstract: Nowadays, Vision Transformer (ViT) is widely utilized in various computer vision tasks, owing to its unique self-attention mechanism. However, the model architecture of ViT is complex and often challenging to comprehend, leading to a steep learning curve. ViT developers and users frequently encounter difficulties in interpreting its inner workings. Therefore, a visualization system is needed to assist ViT users in understanding its functionality. This paper introduces EL-VIT, an interactive visual analytics system designed to probe the Vision Transformer and facilitate a better understanding of its operations. The system consists of four layers of visualization views. The first three layers include model overview, knowledge background graph, and model detail view. These three layers elucidate the operation process of ViT from three perspectives: the overall model architecture, detailed explanation, and mathematical operations, enabling users to understand the underlying principles and the transition process between layers. The fourth interpretation view helps ViT users and experts gain a deeper understanding by calculating the cosine similarity between patches. Our two usage scenarios demonstrate the effectiveness and usability of EL-VIT in helping ViT users understand the working mechanism of ViT.
- [162] arXiv:2401.12672 [ pdf , other ]
-
Title: ChatGraph: Chat with Your GraphsSubjects: Artificial Intelligence (cs.AI)
Abstract: Graph analysis is fundamental in real-world applications. Traditional approaches rely on SPARQL-like languages or clicking-and-dragging interfaces to interact with graph data. However, these methods either require users to possess high programming skills or support only a limited range of graph analysis functionalities. To address the limitations, we propose a large language model (LLM)-based framework called ChatGraph. With ChatGraph, users can interact with graphs through natural language, making it easier to use and more flexible than traditional approaches. The core of ChatGraph lies in generating chains of graph analysis APIs based on the understanding of the texts and graphs inputted in the user prompts. To achieve this, ChatGraph consists of three main modules: an API retrieval module that searches for relevant APIs, a graph-aware LLM module that enables the LLM to comprehend graphs, and an API chain-oriented finetuning module that guides the LLM in generating API chains.
- [163] arXiv:2401.12700 [ pdf , other ]
-
Title: Securing Recommender System via Cooperative TrainingComments: arXiv admin note: text overlap with arXiv:2210.13762Subjects: Artificial Intelligence (cs.AI)
Abstract: Recommender systems are often susceptible to well-crafted fake profiles, leading to biased recommendations. Among existing defense methods, data-processing-based methods inevitably exclude normal samples, while model-based methods struggle to enjoy both generalization and robustness. To this end, we suggest integrating data processing and the robust model to propose a general framework, Triple Cooperative Defense (TCD), which employs three cooperative models that mutually enhance data and thereby improve recommendation robustness. Furthermore, Considering that existing attacks struggle to balance bi-level optimization and efficiency, we revisit poisoning attacks in recommender systems and introduce an efficient attack strategy, Co-training Attack (Co-Attack), which cooperatively optimizes the attack optimization and model training, considering the bi-level setting while maintaining attack efficiency. Moreover, we reveal a potential reason for the insufficient threat of existing attacks is their default assumption of optimizing attacks in undefended scenarios. This overly optimistic setting limits the potential of attacks. Consequently, we put forth a Game-based Co-training Attack (GCoAttack), which frames the proposed CoAttack and TCD as a game-theoretic process, thoroughly exploring CoAttack's attack potential in the cooperative training of attack and defense. Extensive experiments on three real datasets demonstrate TCD's superiority in enhancing model robustness. Additionally, we verify that the two proposed attack strategies significantly outperform existing attacks, with game-based GCoAttack posing a greater poisoning threat than CoAttack.
- [164] arXiv:2401.12731 [ pdf , other ]
-
Title: The Distributional Uncertainty of the SHAP score in Explainable Machine LearningAuthors: Santiago Cifuentes , Leopoldo Bertossi , Nina Pardal , Sergio Abriola , Maria Vanina Martinez , Miguel RomeroSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: Attribution scores reflect how important the feature values in an input entity are for the output of a machine learning model. One of the most popular attribution scores is the SHAP score, which is an instantiation of the general Shapley value used in coalition game theory. The definition of this score relies on a probability distribution on the entity population. Since the exact distribution is generally unknown, it needs to be assigned subjectively or be estimated from data, which may lead to misleading feature scores. In this paper, we propose a principled framework for reasoning on SHAP scores under unknown entity population distributions. In our framework, we consider an uncertainty region that contains the potential distributions, and the SHAP score of a feature becomes a function defined over this region. We study the basic problems of finding maxima and minima of this function, which allows us to determine tight ranges for the SHAP scores of all features. In particular, we pinpoint the complexity of these problems, and other related ones, showing them to be NP-complete. Finally, we present experiments on a real-world dataset, showing that our framework may contribute to a more robust feature scoring.
- [165] arXiv:2401.12783 [ pdf , other ]
-
Title: A Review of Deep Learning Methods for Photoplethysmography DataAuthors: Guangkun Nie , Jiabao Zhu , Gongzheng Tang , Deyun Zhang , Shijia Geng , Qinghao Zhao , Shenda HongSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract: Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this review, we systematically reviewed papers that applied deep learning models to process PPG data between January 1st of 2017 and July 31st of 2023 from Google Scholar, PubMed and Dimensions. Each paper is analyzed from three key perspectives: tasks, models, and data. We finally extracted 193 papers where different deep learning frameworks were used to process PPG signals. Based on the tasks addressed in these papers, we categorized them into two major groups: medical-related, and non-medical-related. The medical-related tasks were further divided into seven subgroups, including blood pressure analysis, cardiovascular monitoring and diagnosis, sleep health, mental health, respiratory monitoring and analysis, blood glucose analysis, as well as others. The non-medical-related tasks were divided into four subgroups, which encompass signal processing, biometric identification, electrocardiogram reconstruction, and human activity recognition. In conclusion, significant progress has been made in the field of using deep learning methods to process PPG data recently. This allows for a more thorough exploration and utilization of the information contained in PPG signals. However, challenges remain, such as limited quantity and quality of publicly available databases, a lack of effective validation in real-world scenarios, and concerns about the interpretability, scalability, and complexity of deep learning models. Moreover, there are still emerging research areas that require further investigation.
- [166] arXiv:2401.12846 [ pdf , other ]
-
Title: How well can large language models explain business processes?Comments: 39 pages, 12 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) are likely to play a prominent role in future AI-augmented business process management systems (ABPMSs) catering functionalities across all system lifecycle stages. One such system's functionality is Situation-Aware eXplainability (SAX), which relates to generating causally sound and yet human-interpretable explanations that take into account the process context in which the explained condition occurred. In this paper, we present the SAX4BPM framework developed to generate SAX explanations. The SAX4BPM suite consists of a set of services and a central knowledge repository. The functionality of these services is to elicit the various knowledge ingredients that underlie SAX explanations. A key innovative component among these ingredients is the causal process execution view. In this work, we integrate the framework with an LLM to leverage its power to synthesize the various input ingredients for the sake of improved SAX explanations. Since the use of LLMs for SAX is also accompanied by a certain degree of doubt related to its capacity to adequately fulfill SAX along with its tendency for hallucination and lack of inherent capacity to reason, we pursued a methodological evaluation of the quality of the generated explanations. To this aim, we developed a designated scale and conducted a rigorous user study. Our findings show that the input presented to the LLMs aided with the guard-railing of its performance, yielding SAX explanations having better-perceived fidelity. This improvement is moderated by the perception of trust and curiosity. More so, this improvement comes at the cost of the perceived interpretability of the explanation.
- [167] arXiv:2401.12866 [ pdf , other ]
-
Title: Evaluating Collaborative and Autonomous Agents in Data-Stream-Supported Coordination of Mobile CrowdsourcingJournal-ref: Sensors 2023, 23(2), 614Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: Mobile crowdsourcing refers to systems where the completion of tasks necessarily requires physical movement of crowdworkers in an on-demand workforce. Evidence suggests that in such systems, tasks often get assigned to crowdworkers who struggle to complete those tasks successfully, resulting in high failure rates and low service quality. A promising solution to ensure higher quality of service is to continuously adapt the assignment and respond to failure-causing events by transferring tasks to better-suited workers who use different routes or vehicles. However, implementing task transfers in mobile crowdsourcing is difficult because workers are autonomous and may reject transfer requests. Moreover, task outcomes are uncertain and need to be predicted. In this paper, we propose different mechanisms to achieve outcome prediction and task coordination in mobile crowdsourcing. First, we analyze different data stream learning approaches for the prediction of task outcomes. Second, based on the suggested prediction model, we propose and evaluate two different approaches for task coordination with different degrees of autonomy: an opportunistic approach for crowdshipping with collaborative, but non-autonomous workers, and a market-based model with autonomous workers for crowdsensing.
- [168] arXiv:2401.12869 [ pdf , other ]
-
Title: TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic TasksSubjects: Artificial Intelligence (cs.AI)
Abstract: Language models (LMs) can solve tasks such as answering questions about tables or images by writing programs. However, using primitive functions often leads to verbose and error-prone programs, and higher-level functions require expert design. To enable better solutions without human labor, we ask code LMs to curate reusable high-level functions, and use them to write solutions. We present TROVE, a training-free method of inducing a verifiable and efficient toolbox of functions, by generating via using, growing, and periodically trimming the toolbox. On 11 datasets from math, table question answering, and image reasoning tasks, TROVE consistently yields simpler solutions with higher accuracy than baselines using CODELLAMA and previous methods using GPT, while using 79-98% smaller toolboxes. TROVE further enables 31% faster and 13% more accurate human verification than baselines. With the same pipeline, it creates diverse functions for varied tasks and datasets, providing insights into their individual characteristics.
- [169] arXiv:2401.12915 [ pdf , other ]
-
Title: Red Teaming Visual Language ModelsComments: Working in progressSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: VLMs (Vision-Language Models) extend the capabilities of LLMs (Large Language Models) to accept multimodal inputs. Since it has been verified that LLMs can be induced to generate harmful or inaccurate content through specific test cases (termed as Red Teaming), how VLMs perform in similar scenarios, especially with their combination of textual and visual inputs, remains a question. To explore this problem, we present a novel red teaming dataset RTVLM, which encompasses 10 subtasks (e.g., image misleading, multi-modal jail-breaking, face fairness, etc) under 4 primary aspects (faithfulness, privacy, safety, fairness). Our RTVLM is the first red-teaming dataset to benchmark current VLMs in terms of these 4 different aspects. Detailed analysis shows that 10 prominent open-sourced VLMs struggle with the red teaming in different degrees and have up to 31% performance gap with GPT-4V. Additionally, we simply apply red teaming alignment to LLaVA-v1.5 with Supervised Fine-tuning (SFT) using RTVLM, and this bolsters the models' performance with 10% in RTVLM test set, 13% in MM-Hal, and without noticeable decline in MM-Bench, overpassing other LLaVA-based models with regular alignment data. This reveals that current open-sourced VLMs still lack red teaming alignment. Our code and datasets will be open-source.
- [170] arXiv:2401.12917 [ pdf , other ]
-
Title: Active Inference as a Model of AgencyComments: Accepted in RLDM2022 for the workshop 'RL as a model of agency'Subjects: Artificial Intelligence (cs.AI)
Abstract: Is there a canonical way to think of agency beyond reward maximisation? In this paper, we show that any type of behaviour complying with physically sound assumptions about how macroscopic biological agents interact with the world canonically integrates exploration and exploitation in the sense of minimising risk and ambiguity about states of the world. This description, known as active inference, refines the free energy principle, a popular descriptive framework for action and perception originating in neuroscience. Active inference provides a normative Bayesian framework to simulate and model agency that is widely used in behavioural neuroscience, reinforcement learning (RL) and robotics. The usefulness of active inference for RL is three-fold. \emph{a}) Active inference provides a principled solution to the exploration-exploitation dilemma that usefully simulates biological agency. \emph{b}) It provides an explainable recipe to simulate behaviour, whence behaviour follows as an explainable mixture of exploration and exploitation under a generative world model, and all differences in behaviour are explicit in differences in world model. \emph{c}) This framework is universal in the sense that it is theoretically possible to rewrite any RL algorithm conforming to the descriptive assumptions of active inference as an active inference algorithm. Thus, active inference can be used as a tool to uncover and compare the commitments and assumptions of more specific models of agency.
- [171] arXiv:2401.12920 [ pdf , other ]
-
Title: Truck Parking Usage Prediction with Decomposed Graph Neural NetworksComments: 10 pages, 5 figures, 3 tables, Manuscript for IEEE Transactions on Intelligent Transportation SystemsSubjects: Artificial Intelligence (cs.AI)
Abstract: Truck parking on freight corridors faces various challenges, such as insufficient parking spaces and compliance with Hour-of-Service (HOS) regulations. These constraints often result in unauthorized parking practices, causing safety concerns. To enhance the safety of freight operations, providing accurate parking usage prediction proves to be a cost-effective solution. Despite the existing research demonstrating satisfactory accuracy for predicting individual truck parking site usage, few approaches have been proposed for predicting usage with spatial dependencies of multiple truck parking sites. We present the Regional Temporal Graph Neural Network (RegT-GCN) as a predictive framework for assessing parking usage across the entire state to provide better truck parking information and mitigate unauthorized parking. The framework leverages the topological structures of truck parking site distributions and historical parking data to predict occupancy rates across a state. To achieve this, we introduce a Regional Decomposition approach, which effectively captures the geographical characteristics. We also introduce the spatial module working efficiently with the temporal module. Evaluation results demonstrate that the proposed model surpasses other baseline models, improving the performance by more than $20\%$ compared with the original model. The proposed model allows truck parking sites' percipience of the topological structures and provides higher performance.
- [172] arXiv:2401.13006 [ pdf , other ]
-
Title: CIMGEN: Controlled Image Manipulation by Finetuning Pretrained Generative Models on Limited DataAuthors: Chandrakanth Gudavalli , Erik Rosten , Lakshmanan Nataraj , Shivkumar Chandrasekaran , B. S. ManjunathSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract: Content creation and image editing can benefit from flexible user controls. A common intermediate representation for conditional image generation is a semantic map, that has information of objects present in the image. When compared to raw RGB pixels, the modification of semantic map is much easier. One can take a semantic map and easily modify the map to selectively insert, remove, or replace objects in the map. The method proposed in this paper takes in the modified semantic map and alter the original image in accordance to the modified map. The method leverages traditional pre-trained image-to-image translation GANs, such as CycleGAN or Pix2Pix GAN, that are fine-tuned on a limited dataset of reference images associated with the semantic maps. We discuss the qualitative and quantitative performance of our technique to illustrate its capacity and possible applications in the fields of image forgery and image editing. We also demonstrate the effectiveness of the proposed image forgery technique in thwarting the numerous deep learning-based image forensic techniques, highlighting the urgent need to develop robust and generalizable image forensic tools in the fight against the spread of fake media.
- [173] arXiv:2401.13110 [ pdf , other ]
-
Title: XAI for All: Can Large Language Models Simplify Explainable AI?Authors: Philip Mavrepis , Georgios Makridis , Georgios Fatouros , Vasileios Koukos , Maria Margarita Separdani , Dimosthenis KyriazisSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: The field of Explainable Artificial Intelligence (XAI) often focuses on users with a strong technical background, making it challenging for non-experts to understand XAI methods. This paper presents "x-[plAIn]", a new approach to make XAI more accessible to a wider audience through a custom Large Language Model (LLM), developed using ChatGPT Builder. Our goal was to design a model that can generate clear, concise summaries of various XAI methods, tailored for different audiences, including business professionals and academics. The key feature of our model is its ability to adapt explanations to match each audience group's knowledge level and interests. Our approach still offers timely insights, facilitating the decision-making process by the end users. Results from our use-case studies show that our model is effective in providing easy-to-understand, audience-specific explanations, regardless of the XAI method used. This adaptability improves the accessibility of XAI, bridging the gap between complex AI technologies and their practical applications. Our findings indicate a promising direction for LLMs in making advanced AI concepts more accessible to a diverse range of users.
- [174] arXiv:2401.13112 [ pdf , other ]
-
Title: DISCOUNT: Distributional Counterfactual Explanation With Optimal TransportComments: Under review in ICML 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (stat.ML)
Abstract: Counterfactual Explanations (CE) is the de facto method for providing insight and interpretability in black-box decision-making models by identifying alternative input instances that lead to different outcomes. This paper extends the concept of CEs to a distributional context, broadening the scope from individual data points to entire input and output distributions, named Distributional Counterfactual Explanation (DCE). In DCE, our focus shifts to analyzing the distributional properties of the factual and counterfactual, drawing parallels to the classical approach of assessing individual instances and their resulting decisions. We leverage Optimal Transport (OT) to frame a chance-constrained optimization problem, aiming to derive a counterfactual distribution that closely aligns with its factual counterpart, substantiated by statistical confidence. Our proposed optimization method, DISCOUNT, strategically balances this confidence across both input and output distributions. This algorithm is accompanied by an analysis of its convergence rate. The efficacy of our proposed method is substantiated through a series of illustrative case studies, highlighting its potential in providing deep insights into decision-making models.
- [175] arXiv:2401.13192 [ pdf , ps , other ]
-
Title: Generative Design of Crystal Structures by Point Cloud Representations and Diffusion ModelComments: I have submitted to a journalSubjects: Artificial Intelligence (cs.AI) ; Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG); Computational Physics (physics.comp-ph)
Abstract: Efficiently generating energetically stable crystal structures has long been a challenge in material design, primarily due to the immense arrangement of atoms in a crystal lattice. To facilitate the discovery of stable material, we present a framework for the generation of synthesizable materials, leveraging a point cloud representation to encode intricate structural information. At the heart of this framework lies the introduction of a diffusion model as its foundational pillar. To gauge the efficacy of our approach, we employ it to reconstruct input structures from our training datasets, rigorously validating its high reconstruction performance. Furthermore, we demonstrate the profound potential of Point Cloud-Based Crystal Diffusion (PCCD) by generating entirely new materials, emphasizing their synthesizability. Our research stands as a noteworthy contribution to the advancement of materials design and synthesis through the cutting-edge avenue of generative design instead of the conventional substitution or experience-based discovery.
- [176] arXiv:2401.13408 [ pdf , ps , other ]
-
Title: Causal PerceptionComments: arXiv admin note: text overlap with arXiv:2305.09535 by other authorsSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract: Perception occurs when two individuals interpret the same information differently. Despite being a known phenomenon with implications for bias in decision-making, as individuals' experience determines interpretation, perception remains largely overlooked in automated decision-making (ADM) systems. In particular, it can have considerable effects on the fairness or fair usage of an ADM system, as fairness itself is context-specific and its interpretation dependent on who is judging. In this work, we formalize perception under causal reasoning to capture the act of interpretation by an individual. We also formalize individual experience as additional causal knowledge that comes with and is used by an individual. Further, we define and discuss loaded attributes, which are attributes prone to evoke perception. Sensitive attributes, such as gender and race, are clear examples of loaded attributes. We define two kinds of causal perception, unfaithful and inconsistent, based on the causal properties of faithfulness and consistency. We illustrate our framework through a series of decision-making examples and discuss relevant fairness applications. The goal of this work is to position perception as a parameter of interest, useful for extending the standard, single interpretation ADM problem formulation.
- [177] arXiv:2401.13604 [ pdf , other ]
-
Title: Stream-based perception for cognitive agents in mobile ecosystemsJournal-ref: AI Communications, vol. 32, no. 4, pp. 271-286, 2019Subjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA)
Abstract: Cognitive agent abstractions can help to engineer intelligent systems across mobile devices. On smartphones, the data obtained from onboard sensors can give valuable insights into the user's current situation. Unfortunately, today's cognitive agent frameworks cannot cope well with the challenging characteristics of sensor data. Sensor data is located on a low abstraction level and the individual data elements are not meaningful when observed in isolation. In contrast, cognitive agents operate on high-level percepts and lack the means to effectively detect complex spatio-temporal patterns in sequences of multiple percepts. In this paper, we present a stream-based perception approach that enables the agents to perceive meaningful situations in low-level sensor data streams. We present a crowdshipping case study where autonomous, self-interested agents collaborate to deliver parcels to their destinations. We show how situations derived from smartphone sensor data can trigger and guide auctions, which the agents use to reach agreements. Experiments with real smartphone data demonstrate the benefits of stream-based agent perception.
- [178] arXiv:2401.13752 [ pdf , ps , other ]
-
Title: Explaining Image ClassifiersSubjects: Artificial Intelligence (cs.AI)
Abstract: We focus on explaining image classifiers, taking the work of Mothilal et al. [2021] (MMTS) as our point of departure. We observe that, although MMTS claim to be using the definition of explanation proposed by Halpern [2016], they do not quite do so. Roughly speaking, Halpern's definition has a necessity clause and a sufficiency clause. MMTS replace the necessity clause by a requirement that, as we show, implies it. Halpern's definition also allows agents to restrict the set of options considered. While these difference may seem minor, as we show, they can have a nontrivial impact on explanations. We also show that, essentially without change, Halpern's definition can handle two issues that have proved difficult for other approaches: explanations of absence (when, for example, an image classifier for tumors outputs "no tumor") and explanations of rare events (such as tumors).
- [179] arXiv:2401.13770 [ pdf , ps , other ]
-
Title: AlphaMapleSAT: An MCTS-based Cube-and-Conquer SAT Solver for Hard Combinatorial ProblemsSubjects: Artificial Intelligence (cs.AI) ; Combinatorics (math.CO)
Abstract: This paper introduces AlphaMapleSAT, a novel Monte Carlo Tree Search (MCTS) based Cube-and-Conquer (CnC) SAT solving method aimed at efficiently solving challenging combinatorial problems. Despite the tremendous success of CnC solvers in solving a variety of hard combinatorial problems, the lookahead cubing techniques at the heart of CnC have not evolved much for many years. Part of the reason is the sheer difficulty of coming up with new cubing techniques that are both low-cost and effective in partitioning input formulas into sub-formulas, such that the overall runtime is minimized.
Lookahead cubing techniques used by current state-of-the-art CnC solvers, such as March, keep their cubing costs low by constraining the search for the optimal splitting variables. By contrast, our key innovation is a deductively-driven MCTS-based lookahead cubing technique, that performs a deeper heuristic search to find effective cubes, while keeping the cubing cost low. We perform an extensive comparison of AlphaMapleSAT against the March CnC solver on challenging combinatorial problems such as the minimum Kochen-Specker and Ramsey problems. We also perform ablation studies to verify the efficacy of the MCTS heuristic search for the cubing problem. Results show up to 2.3x speedup in parallel (and up to 27x in sequential) elapsed real time. - [180] arXiv:2401.13883 [ pdf , other ]
-
Title: Domain-Independent Dynamic ProgrammingComments: Manuscript submitted to JACMSubjects: Artificial Intelligence (cs.AI)
Abstract: For combinatorial optimization problems, model-based paradigms such as mixed-integer programming (MIP) and constraint programming (CP) aim to decouple modeling and solving a problem: the `holy grail' of declarative problem solving. We propose domain-independent dynamic programming (DIDP), a new model-based paradigm based on dynamic programming (DP). While DP is not new, it has typically been implemented as a problem-specific method. We introduce Dynamic Programming Description Language (DyPDL), a formalism to define DP models based on a state transition system, inspired by AI planning. We show that heuristic search algorithms can be used to solve DyPDL models and propose seven DIDP solvers. We experimentally compare our DIDP solvers with commercial MIP and CP solvers (solving MIP and CP models, respectively) on common benchmark instances of eleven combinatorial optimization problem classes. We show that DIDP outperforms MIP in nine problem classes, CP also in nine problem classes, and both MIP and CP in seven.
- [181] arXiv:2401.13935 [ pdf , other ]
-
Title: A New Paradigm for Counterfactual Reasoning in Fairness and RecourseSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY); Machine Learning (stat.ML)
Abstract: Counterfactuals and counterfactual reasoning underpin numerous techniques for auditing and understanding artificial intelligence (AI) systems. The traditional paradigm for counterfactual reasoning in this literature is the interventional counterfactual, where hypothetical interventions are imagined and simulated. For this reason, the starting point for causal reasoning about legal protections and demographic data in AI is an imagined intervention on a legally-protected characteristic, such as ethnicity, race, gender, disability, age, etc. We ask, for example, what would have happened had your race been different? An inherent limitation of this paradigm is that some demographic interventions -- like interventions on race -- may not translate into the formalisms of interventional counterfactuals. In this work, we explore a new paradigm based instead on the backtracking counterfactual, where rather than imagine hypothetical interventions on legally-protected characteristics, we imagine alternate initial conditions while holding these characteristics fixed. We ask instead, what would explain a counterfactual outcome for you as you actually are or could be? This alternate framework allows us to address many of the same social concerns, but to do so while asking fundamentally different questions that do not rely on demographic interventions.
- [182] arXiv:2401.14086 [ pdf , other ]
-
Title: Generating Likely Counterfactuals Using Sum-Product NetworksSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract: Due to user demand and recent regulation (GDPR, AI Act), decisions made by AI systems need to be explained. These decisions are often explainable only post hoc, where counterfactual explanations are popular. The question of what constitutes the best counterfactual explanation must consider multiple aspects, where "distance from the sample" is the most common. We argue that this requirement frequently leads to explanations that are unlikely and, therefore, of limited value. Here, we present a system that provides high-likelihood explanations. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using mixed-integer optimization (MIO). In the process, we propose an MIO formulation of a Sum-Product Network (SPN) and use the SPN to estimate the likelihood of a counterfactual, which can be of independent interest. A numerical comparison against several methods for generating counterfactual explanations is provided.
- [183] arXiv:2401.14153 [ pdf , other ]
-
Title: Agent-based Simulation with Netlogo to Evaluate AmI ScenariosSubjects: Artificial Intelligence (cs.AI)
Abstract: In this paper an agent-based simulation is developed in order to evaluate an AmI scenario based on agents. Many AmI applications are implemented through agents but they are not compared to any other existing alternative in order to evaluate the relative benefits of using them. The proposal simulation environment developed in Netlogo analyse such benefits using two evaluation criteria: First, measuring agent satisfaction of different types of desires along the execution. Second, measuring time savings obtained through a correct use of context information.
So, here, a previously suggested agent architecture, an ontology and a 12-steps protocol to provide AmI services in airports, is evaluated using a NetLogo simulation environment. The present work uses a NetLogo model considering scalability problems of this application domain but using FIPA and BDI extensions to be coherent with our previous works and our previous JADE implementation of them.
The NetLogo model presented simulates an airport with agent users passing through several zones located in a specific order in a map: passport controls, check-in counters of airline companies, boarding gates, different types of shopping. Although initial data in simulations are generated randomly, and the model is just an approximation of real-world airports, the definition of this case of use of Ambient Intelligence through NetLogo agents opens an interesting way to evaluate the benefits of using Ambient Intelligence, which is a significant contribution to the final development of them. - [184] arXiv:2401.14183 [ pdf , other ]
-
Title: Towards Autonomous Supply Chains: Definition, Characteristics, Conceptual Framework, and Autonomy LevelsComments: This paper includes 20 pages and 8 figuresSubjects: Artificial Intelligence (cs.AI) ; Multiagent Systems (cs.MA); Systems and Control (eess.SY); Optimization and Control (math.OC)
Abstract: Recent global disruptions, such as the pandemic and geopolitical conflicts, have profoundly exposed vulnerabilities in traditional supply chains, requiring exploration of more resilient alternatives. Autonomous supply chains (ASCs) have emerged as a potential solution, offering increased visibility, flexibility, and resilience in turbulent trade environments. Despite discussions in industry and academia over several years, ASCs lack well-established theoretical foundations. This paper addresses this research gap by presenting a formal definition of ASC along with its defining characteristics and auxiliary concepts. We propose a layered conceptual framework called the MIISI model. An illustrative case study focusing on the meat supply chain demonstrates an initial ASC implementation based on this conceptual model. Additionally, we introduce a seven-level supply chain autonomy reference model, delineating a trajectory towards achieving a full supply chain autonomy. Recognising that this work represents an initial endeavour, we emphasise the need for continued exploration in this emerging domain. We anticipate that this work will stimulate further research, both theoretical and technical, and contribute to the continual evolution of ASCs.
- [185] arXiv:2401.14461 [ pdf , other ]
-
Title: Marabou 2.0: A Versatile Formal Analyzer of Neural NetworksAuthors: Haoze Wu , Omri Isac , Aleksandar Zeljić , Teruhiro Tagomori , Matthew Daggitt , Wen Kokke , Idan Refaeli , Guy Amir , Kyle Julian , Shahaf Bassan , Pei Huang , Ori Lahav , Min Wu , Min Zhang , Ekaterina Komendantskaya , Guy Katz , Clark BarrettComments: In submissionSubjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
Abstract: This paper serves as a comprehensive system description of version 2.0 of the Marabou framework for formal analysis of neural networks. We discuss the tool's architectural design and highlight the major features and components introduced since its initial release.
- [186] arXiv:2401.14511 [ pdf , ps , other ]
-
Title: Automated legal reasoning with discretion to act using s(LAW)Journal-ref: Artificial Intelligence and Law (2023)Subjects: Artificial Intelligence (cs.AI)
Abstract: Automated legal reasoning and its application in smart contracts and automated decisions are increasingly attracting interest. In this context, ethical and legal concerns make it necessary for automated reasoners to justify in human-understandable terms the advice given. Logic Programming, specially Answer Set Programming, has a rich semantics and has been used to very concisely express complex knowledge. However, modelling discretionality to act and other vague concepts such as ambiguity cannot be expressed in top-down execution models based on Prolog, and in bottom-up execution models based on ASP the justifications are incomplete and/or not scalable. We propose to use s(CASP), a top-down execution model for predicate ASP, to model vague concepts following a set of patterns. We have implemented a framework, called s(LAW), to model, reason, and justify the applicable legislation and validate it by translating (and benchmarking) a representative use case, the criteria for the admission of students in the "Comunidad de Madrid".
- [187] arXiv:2401.14636 [ pdf , other ]
-
Title: Efficient Constraint Generation for Stochastic Shortest Path ProblemsComments: Extended version of AAAI 2024 paperSubjects: Artificial Intelligence (cs.AI)
Abstract: Current methods for solving Stochastic Shortest Path Problems (SSPs) find states' costs-to-go by applying Bellman backups, where state-of-the-art methods employ heuristics to select states to back up and prune. A fundamental limitation of these algorithms is their need to compute the cost-to-go for every applicable action during each state backup, leading to unnecessary computation for actions identified as sub-optimal. We present new connections between planning and operations research and, using this framework, we address this issue of unnecessary computation by introducing an efficient version of constraint generation for SSPs. This technique allows algorithms to ignore sub-optimal actions and avoid computing their costs-to-go. We also apply our novel technique to iLAO* resulting in a new algorithm, CG-iLAO*. Our experiments show that CG-iLAO* ignores up to 57% of iLAO*'s actions and it solves problems up to 8x and 3x faster than LRTDP and iLAO*.
- [188] arXiv:2401.14743 [ pdf , ps , other ]
-
Title: Synthetic Multimodal Dataset for Empowering Safety and Well-being in Home EnvironmentsAuthors: Takanori Ugai , Shusaku Egami , Swe Nwe Nwe Htun , Kouji Kozaki , Takahiro Kawamura , Ken FukudaComments: 7 pages, 2 figures,4 tablesSubjects: Artificial Intelligence (cs.AI)
Abstract: This paper presents a synthetic multimodal dataset of daily activities that fuses video data from a 3D virtual space simulator with knowledge graphs depicting the spatiotemporal context of the activities. The dataset is developed for the Knowledge Graph Reasoning Challenge for Social Issues (KGRC4SI), which focuses on identifying and addressing hazardous situations in the home environment. The dataset is available to the public as a valuable resource for researchers and practitioners developing innovative solutions recognizing human behaviors to enhance safety and well-being in
- [189] arXiv:2401.14811 [ pdf , ps , other ]
-
Title: On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal TasksJournal-ref: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1974-1984, 2023Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive necessary and sufficient conditions that describe when a problem in this class can be expressed using a scalar, Markovian reward. Moreover, we find that scalar, Markovian rewards are unable to express most of the instances in each of these three classes. We thereby contribute to a more complete understanding of what standard reward functions can and cannot express. In addition to this, we also call attention to modal problems as a new class of problems, since they have so far not been given any systematic treatment in the RL literature. We also briefly outline some approaches for solving some of the problems we discuss, by means of bespoke RL algorithms.
- [190] arXiv:2401.14923 [ pdf , other ]
-
Title: Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful TasksComments: In AAMAS 2024Subjects: Artificial Intelligence (cs.AI) ; Machine Learning (cs.LG)
Abstract: Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a boundedly rational human agent. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of MDP equivalence specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans.
- [191] arXiv:2401.14933 [ pdf , ps , other ]
-
Title: SSDOnt: an Ontology for representing Single-Subject Design StudiesComments: This document is the Accepted Manuscript version of a Published Work that appeared in final form in Methods of Information in Medicine 57(01/02) : 55-61 (2018), copyright 2018 Schattauer. To access the final edited and published work see this https URLJournal-ref: Methods of Information in Medicine 57(01/02) : 55-61 (2018)Subjects: Artificial Intelligence (cs.AI)
Abstract: Background: Single-Subject Design is used in several areas such as education and biomedicine. However, no suited formal vocabulary exists for annotating the detailed configuration and the results of this type of research studies with the appropriate granularity for looking for information about them. Therefore, the search for those study designs relies heavily on a syntactical search on the abstract, keywords or full text of the publications about the study, which entails some limitations. Objective: To present SSDOnt, a specific purpose ontology for describing and annotating single-subject design studies, so that complex questions can be asked about them afterwards. Methods: The ontology was developed following the NeOn methodology. Once the requirements of the ontology were defined, a formal model was described in a Description Logic and later implemented in the ontology language OWL 2 DL. Results: We show how the ontology provides a reference model with a suitable terminology for the annotation and searching of single-subject design studies and their main components, such as the phases, the intervention types, the outcomes and the results. Some mappings with terms of related ontologies have been established. We show as proof-of-concept that classes in the ontology can be easily extended to annotate more precise information about specific interventions and outcomes such as those related to autism. Moreover, we provide examples of some types of queries that can be posed to the ontology. Conclusions: SSDOnt has achieved the purpose of covering the descriptions of the domain of single-subject research studies.
- [192] arXiv:2401.15081 [ pdf , other ]
-
Title: Can generative AI and ChatGPT outperform humans on cognitive-demanding problem-solving tasks in science?Journal-ref: Science & Education, 2024Subjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: This study aimed to examine an assumption that generative artificial intelligence (GAI) tools can overcome the cognitive intensity that humans suffer when solving problems. We compared the performance of ChatGPT and GPT-4 on 2019 NAEP science assessments with students by cognitive demands of the items. Fifty-four tasks were coded by experts using a two-dimensional cognitive load framework, including task cognitive complexity and dimensionality. ChatGPT and GPT-4 responses were scored using the scoring keys of NAEP. The analysis of the available data was based on the average student ability scores for students who answered each item correctly and the percentage of students who responded to individual items. Results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered the NAEP science assessments. As the cognitive demand for NAEP tasks increases, statistically higher average student ability scores are required to correctly address the questions. This pattern was observed for students in grades 4, 8, and 12, respectively. However, ChatGPT and GPT-4 were not statistically sensitive to the increase in cognitive demands of the tasks, except for Grade 4. As the first study focusing on comparing GAI and K-12 students in problem-solving in science, this finding implies the need for changes to educational objectives to prepare students with competence to work with GAI tools in the future. Education ought to emphasize the cultivation of advanced cognitive skills rather than depending solely on tasks that demand cognitive intensity. This approach would foster critical thinking, analytical skills, and the application of knowledge in novel contexts. Findings also suggest the need for innovative assessment practices by moving away from cognitive intensity tasks toward creativity and analytical skills to avoid the negative effects of GAI on testing more efficiently.
- [193] arXiv:2401.15188 [ pdf , other ]
-
Title: CAREForMe: Contextual Multi-Armed Bandit Recommendation Framework for Mental HealthAuthors: Sheng Yu , Narjes Nourzad , Randye J. Semple , Yixue Zhao , Emily Zhou , Bhaskar KrishnamachariComments: MOBILESoft 2024Subjects: Artificial Intelligence (cs.AI)
Abstract: The COVID-19 pandemic has intensified the urgency for effective and accessible mental health interventions in people's daily lives. Mobile Health (mHealth) solutions, such as AI Chatbots and Mindfulness Apps, have gained traction as they expand beyond traditional clinical settings to support daily life. However, the effectiveness of current mHealth solutions is impeded by the lack of context-awareness, personalization, and modularity to foster their reusability. This paper introduces CAREForMe, a contextual multi-armed bandit (CMAB) recommendation framework for mental health. Designed with context-awareness, personalization, and modularity at its core, CAREForMe harnesses mobile sensing and integrates online learning algorithms with user clustering capability to deliver timely, personalized recommendations. With its modular design, CAREForMe serves as both a customizable recommendation framework to guide future research, and a collaborative platform to facilitate interdisciplinary contributions in mHealth research. We showcase CAREForMe's versatility through its implementation across various platforms (e.g., Discord, Telegram) and its customization to diverse recommendation features.
- [194] arXiv:2401.15196 [ pdf , other ]
-
Title: Regularized Q-Learning with Linear Function ApproximationSubjects: Artificial Intelligence (cs.AI)
Abstract: Several successful reinforcement learning algorithms make use of regularization to promote multi-modal policies that exhibit enhanced exploration and robustness. With functional approximation, the convergence properties of some of these algorithms (e.g. soft Q-learning) are not well understood. In this paper, we consider a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation. The algorithm operates on two scales: a slower scale for updating the target network of the state-action values, and a faster scale for approximating the Bellman backups in the subspace of the span of basis vectors. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.
- [195] arXiv:2401.15356 [ pdf , other ]
-
Title: A Statistical Framework for Measuring AI RelianceSubjects: Artificial Intelligence (cs.AI) ; Human-Computer Interaction (cs.HC)
Abstract: Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's prediction from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.
- [196] arXiv:2401.15443 [ pdf , other ]
-
Title: DiffuserLite: Towards Real-time Diffusion PlanningSubjects: Artificial Intelligence (cs.AI)
Abstract: Diffusion planning has been recognized as an effective decision-making paradigm in various domains. The capability of conditionally generating high-quality long-horizon trajectories makes it a promising research direction. However, existing diffusion planning methods suffer from low decision-making frequencies due to the expensive iterative sampling cost. To address this issue, we introduce DiffuserLite, a super fast and lightweight diffusion planning framework. DiffuserLite employs a planning refinement process (PRP) to generate coarse-to-fine-grained trajectories, significantly reducing the modeling of redundant information and leading to notable increases in decision-making frequency. Our experimental results demonstrate that DiffuserLite achieves a decision-making frequency of $122$Hz ($112.7$x faster than previous mainstream frameworks) and reaches state-of-the-art performance on D4RL benchmarks. In addition, our neat DiffuserLite framework can serve as a flexible plugin to enhance decision frequency in other diffusion planning algorithms, providing a structural design reference for future works. More details and visualizations are available at this https URL .
- [197] arXiv:2401.15621 [ pdf , other ]
-
Title: SNAP: Semantic Stories for Next Activity PredictionSubjects: Artificial Intelligence (cs.AI)
Abstract: Predicting the next activity in an ongoing process is one of the most common classification tasks in the business process management (BPM) domain. It allows businesses to optimize resource allocation, enhance operational efficiency, and aids in risk mitigation and strategic decision-making. This provides a competitive edge in the rapidly evolving confluence of BPM and AI. Existing state-of-the-art AI models for business process prediction do not fully capitalize on available semantic information within process event logs. As current advanced AI-BPM systems provide semantically-richer textual data, the need for novel adequate models grows. To address this gap, we propose the novel SNAP method that leverages language foundation models by constructing semantic contextual stories from the process historical event logs and using them for the next activity prediction. We compared the SNAP algorithm with nine state-of-the-art models on six benchmark datasets and show that SNAP significantly outperforms them, especially for datasets with high levels of semantic content.
- [198] arXiv:2401.15911 [ pdf , ps , other ]
-
Title: Distribution-consistency Structural Causal ModelsSubjects: Artificial Intelligence (cs.AI) ; Statistics Theory (math.ST)
Abstract: In the field of causal modeling, potential outcomes (PO) and structural causal models (SCMs) stand as the predominant frameworks. However, these frameworks face notable challenges in practically modeling counterfactuals, formalized as parameters of the joint distribution of potential outcomes. Counterfactual reasoning holds paramount importance in contemporary decision-making processes, especially in scenarios that demand personalized incentives based on the joint values of $(Y(0), Y(1))$. This paper begins with an investigation of the PO and SCM frameworks for modeling counterfactuals. Through the analysis, we identify an inherent model capacity limitation, termed as the ``degenerative counterfactual problem'', emerging from the consistency rule that is the cornerstone of both frameworks. To address this limitation, we introduce a novel \textit{distribution-consistency} assumption, and in alignment with it, we propose the Distribution-consistency Structural Causal Models (DiscoSCMs) offering enhanced capabilities to model counterfactuals. To concretely reveal the enhanced model capacity, we introduce a new identifiable causal parameter, \textit{the probability of consistency}, which holds practical significance within DiscoSCM alone, showcased with a personalized incentive example. Furthermore, we provide a comprehensive set of theoretical results about the ``Ladder of Causation'' within the DiscoSCM framework. We hope it opens new avenues for future research of counterfactual modeling, ultimately enhancing our understanding of causality and its real-world applications.
- [199] arXiv:2401.16045 [ pdf , other ]
-
Title: Type-based Neural Link Prediction Adapter for Complex Query AnsweringComments: 11 pages, 3 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Answering complex logical queries on incomplete knowledge graphs (KGs) is a fundamental and challenging task in multi-hop reasoning. Recent work defines this task as an end-to-end optimization problem, which significantly reduces the training cost and enhances the generalization of the model by a pretrained link predictors for query answering. However, most existing proposals ignore the critical semantic knowledge inherently available in KGs, such as type information, which could help answer complex logical queries. To this end, we propose TypE-based Neural Link Prediction Adapter (TENLPA), a novel model that constructs type-based entity-relation graphs to discover the latent relationships between entities and relations by leveraging type information in KGs. Meanwhile, in order to effectively combine type information with complex logical queries, an adaptive learning mechanism is introduced, which is trained by back-propagating during the complex query answering process to achieve adaptive adjustment of neural link predictors. Experiments on 3 standard datasets show that TENLPA model achieves state-of-the-art performance on complex query answering with good generalization and robustness.
- [200] arXiv:2401.16119 [ pdf , other ]
-
Title: Triple Disentangled Representation Learning for Multimodal Affective AnalysisComments: 14 pages, 6 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: Multimodal learning has exhibited a significant advantage in affective analysis tasks owing to the comprehensive information of various modalities, particularly the complementary information. Thus, many emerging studies focus on disentangling the modality-invariant and modality-specific representations from input data and then fusing them for prediction. However, our study shows that modality-specific representations may contain information that is irrelevant or conflicting with the tasks, which downgrades the effectiveness of learned multimodal representations. We revisit the disentanglement issue, and propose a novel triple disentanglement approach, TriDiRA, which disentangles the modality-invariant, effective modality-specific and ineffective modality-specific representations from input data. By fusing only the modality-invariant and effective modality-specific representations, TriDiRA can significantly alleviate the impact of irrelevant and conflicting information across modalities during model training. Extensive experiments conducted on four benchmark datasets demonstrate the effectiveness and generalization of our triple disentanglement, which outperforms SOTA methods.
- [201] arXiv:2401.16124 [ pdf , other ]
-
Title: On the generalization of learned constraints for ASP solving in temporal domainsComments: 28 pages, 3 figuresSubjects: Artificial Intelligence (cs.AI)
Abstract: The representation of a dynamic problem in ASP usually boils down to using copies of variables and constraints, one for each time stamp, no matter whether it is directly encoded or via an action or temporal language. The multiplication of variables and constraints is commonly done during grounding and the solver is completely ignorant about the temporal relationship among the different instances. On the other hand, a key factor in the performance of today's ASP solvers is conflict-driven constraint learning. Our question is now whether a constraint learned for particular time steps can be generalized and reused at other time stamps, and ultimately whether this enhances the overall solver performance on temporal problems. Knowing full well the domain of time, we study conditions under which learned dynamic constraints can be generalized. We propose a simple translation of the original logic program such that, for the translated programs, the learned constraints can be generalized to other time points. Additionally, we identify a property of temporal problems that allows us to generalize all learned constraints to all time steps. It turns out that this property is satisfied by many planning problems. Finally, we empirically evaluate the impact of adding the generalized constraints to an ASP solver
- [202] arXiv:2401.16270 [ pdf , other ]
-
Title: Capturing Knowledge Graphs and Rules with Octagon EmbeddingsSubjects: Artificial Intelligence (cs.AI)
Abstract: Region based knowledge graph embeddings represent relations as geometric regions. This has the advantage that the rules which are captured by the model are made explicit, making it straightforward to incorporate prior knowledge and to inspect learned models. Unfortunately, existing approaches are severely restricted in their ability to model relational composition, and hence also their ability to model rules, thus failing to deliver on the main promise of region based models. With the aim of addressing these limitations, we investigate regions which are composed of axis-aligned octagons. Such octagons are particularly easy to work with, as intersections and compositions can be straightforwardly computed, while they are still sufficiently expressive to model arbitrary knowledge graphs. Among others, we also show that our octagon embeddings can properly capture a non-trivial class of rule bases. Finally, we show that our model achieves competitive experimental results.
- [203] arXiv:2401.16287 [ pdf , other ]
-
Title: GAPS: Geometry-Aware Problem SolverSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Geometry problem solving presents a formidable challenge within the NLP community. Existing approaches often rely on models designed for solving math word problems, neglecting the unique characteristics of geometry math problems. Additionally, the current research predominantly focuses on geometry calculation problems, while overlooking other essential aspects like proving. In this study, we address these limitations by proposing the Geometry-Aware Problem Solver (GAPS) model. GAPS is specifically designed to generate solution programs for geometry math problems of various types with the help of its unique problem-type classifier. To achieve this, GAPS treats the solution program as a composition of operators and operands, segregating their generation processes. Furthermore, we introduce the geometry elements enhancement method, which enhances the ability of GAPS to recognize geometry elements accurately. By leveraging these improvements, GAPS showcases remarkable performance in resolving geometry math problems. Our experiments conducted on the UniGeo dataset demonstrate the superiority of GAPS over the state-of-the-art model, Geoformer. Specifically, GAPS achieves an accuracy improvement of more than 5.3% for calculation tasks and an impressive 41.1% for proving tasks. Notably, GAPS achieves an impressive accuracy of 97.5% on proving problems, representing a significant advancement in solving geometry proving tasks.
- [204] arXiv:2401.16398 [ pdf , other ]
-
Title: Zero-shot Imitation Policy via Search in Demonstration DatasetSubjects: Artificial Intelligence (cs.AI)
Abstract: Behavioral cloning uses a dataset of demonstrations to learn a policy. To overcome computationally expensive training procedures and address the policy adaptation problem, we propose to use latent spaces of pre-trained foundation models to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a dynamic search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video Pre-Training model. We compare our model to state-of-the-art, Imitation Learning-based Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach clearly wins in terms of accuracy and perceptual evaluation over learning-based models.
- [205] arXiv:2401.16412 [ pdf , other ]
-
Title: Learning to Manipulate under Limited InformationComments: Appears at the 1st Workshop on Social Choice and Learning Algorithms (SCaLA 2024) held at the 23rd International Conference on Autonomous Agents and Multiagent Systems, organized by B. Armstrong, R. Fairstein, N. Mattei, and Z. Terzopoulou, May 6-7, 2024, Auckland, New ZealandSubjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH)
Abstract: By classic results in social choice theory, any reasonable preferential voting method sometimes gives individuals an incentive to report an insincere preference. The extent to which different voting methods are more or less resistant to such strategic manipulation has become a key consideration for comparing voting methods. Here we measure resistance to manipulation by whether neural networks of varying sizes can learn to profitably manipulate a given voting method in expectation, given different types of limited information about how other voters will vote. We trained over 70,000 neural networks of 26 sizes to manipulate against 8 different voting methods, under 6 types of limited information, in committee-sized elections with 5-21 voters and 3-6 candidates. We find that some voting methods, such as Borda, are highly manipulable by networks with limited information, while others, such as Instant Runoff, are not, despite being quite profitably manipulated by an ideal manipulator with full information. For the two probability models for elections that we use, the overall least manipulable of the 8 methods we study are Condorcet methods, namely Minimax and Split Cycle.
- [206] arXiv:2401.16580 [ pdf , other ]
-
Title: Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling ProblemSubjects: Artificial Intelligence (cs.AI)
Abstract: Job shop scheduling problems represent a significant and complex facet of combinatorial optimization problems, which have traditionally been addressed through either exact or approximate solution methodologies. However, the practical application of these solutions is often challenged due to the complexity of real-world problems. Even when utilizing an approximate solution approach, the time required to identify a near-optimal solution can be prohibitively extensive, and the solutions derived are generally not applicable to new problems. This study proposes an innovative attention-based reinforcement learning method specifically designed for the category of job shop scheduling problems. This method integrates a policy gradient reinforcement learning approach with a modified transformer architecture. A key finding of this research is the ability of our trained learners within the proposed method to be repurposed for larger-scale problems that were not part of the initial training set. Furthermore, empirical evidence demonstrates that our approach surpasses the results of recent studies and outperforms commonly implemented heuristic rules. This suggests that our method offers a promising avenue for future research and practical application in the field of job shop scheduling problems.
- [207] arXiv:2401.16657 [ pdf , other ]
-
Title: Recovering Mental Representations from Large Language Models with Markov Chain Monte CarloSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: Simulating sampling algorithms with people has proven a useful method for efficiently probing and understanding their mental representations. We propose that the same methods can be used to study the representations of Large Language Models (LLMs). While one can always directly prompt either humans or LLMs to disclose their mental representations introspectively, we show that increased efficiency can be achieved by using LLMs as elements of a sampling algorithm. We explore the extent to which we recover human-like representations when LLMs are interrogated with Direct Sampling and Markov chain Monte Carlo (MCMC). We found a significant increase in efficiency and performance using adaptive sampling algorithms based on MCMC. We also highlight the potential of our method to yield a more general method of conducting Bayesian inference \textit{with} LLMs.
- [208] arXiv:2401.16744 [ pdf , other ]
-
Title: ShaRP: Explaining Rankings with Shapley ValuesComments: 8 pagesSubjects: Artificial Intelligence (cs.AI) ; Computers and Society (cs.CY)
Abstract: Algorithmic decisions in critical domains such as hiring, college admissions, and lending are often based on rankings. Because of the impact these decisions have on individuals, organizations, and population groups, there is a need to understand them: to know whether the decisions are abiding by the law, to help individuals improve their rankings, and to design better ranking procedures.
In this paper, we present ShaRP (Shapley for Rankings and Preferences), a framework that explains the contributions of features to different aspects of a ranked outcome, and is based on Shapley values. Using ShaRP, we show that even when the scoring function used by an algorithmic ranker is known and linear, the weight of each feature does not correspond to its Shapley value contribution. The contributions instead depend on the feature distributions, and on the subtle local interactions between the scoring features. ShaRP builds on the Quantitative Input Influence framework, and can compute the contributions of features for multiple Quantities of Interest, including score, rank, pair-wise preference, and top-k. Because it relies on black-box access to the ranker, ShaRP can be used to explain both score-based and learned ranking models. We show results of an extensive experimental validation of ShaRP using real and synthetic datasets, showcasing its usefulness for qualitative analysis. - [209] arXiv:2401.17044 [ pdf , other ]
-
Title: Scalable Mechanism Design for Multi-Agent Path FindingAuthors: Paul Friedrich , Yulun Zhang , Michael Curry , Ludwig Dierks , Stephen McAleer , Jiaoyang Li , Tuomas Sandholm , Sven SeukenComments: 12 pages, 5 figures. Submitted to IJCAI'24Subjects: Artificial Intelligence (cs.AI) ; Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Abstract: Multi-Agent Path Finding (MAPF) involves determining paths for multiple agents to travel simultaneously through a shared area toward particular goal locations. This problem is computationally complex, especially when dealing with large numbers of agents, as is common in realistic applications like autonomous vehicle coordination. Finding an optimal solution is often computationally infeasible, making the use of approximate algorithms essential. Adding to the complexity, agents might act in a self-interested and strategic way, possibly misrepresenting their goals to the MAPF algorithm if it benefits them. Although the field of mechanism design offers tools to align incentives, using these tools without careful consideration can fail when only having access to approximately optimal outcomes. Since approximations are crucial for scalable MAPF algorithms, this poses a significant challenge. In this work, we introduce the problem of scalable mechanism design for MAPF and propose three strategyproof mechanisms, two of which even use approximate MAPF algorithms. We test our mechanisms on realistic MAPF domains with problem sizes ranging from dozens to hundreds of agents. Our findings indicate that they improve welfare beyond a simple baseline.
- [210] arXiv:2401.17045 [ pdf , ps , other ]
-
Title: Explaining Explanations in Probabilistic Logic ProgrammingAuthors: Germán VidalSubjects: Artificial Intelligence (cs.AI) ; Programming Languages (cs.PL)
Abstract: The emergence of tools based on artificial intelligence has also led to the need of producing explanations which are understandable by a human being. In some approaches, the system is not transparent (often referred to as a "black box"), making it difficult to generate appropriate explanations. In this work, though, we consider probabilistic logic programming, a combination of logic programming (for knowledge representation) and probability (to model uncertainty). In this setting, one can say that models are interpretable, which eases its understanding. However, given a particular query, the usual notion of "explanation" is associated with a set of choices, one for each random variable of the model. Unfortunately, this set does not have a causal structure and, in fact, some of the choices are actually irrelevant to the considered query. In order to overcome these shortcomings, we present an approach to explaining explanations which is based on the definition of a query-driven inference mechanism for probabilistic logic programs.
- [211] arXiv:2401.17159 [ pdf , other ]
-
Title: Layered and Staged Monte Carlo Tree Search for SMT Strategy SynthesisSubjects: Artificial Intelligence (cs.AI) ; Logic in Computer Science (cs.LO); Software Engineering (cs.SE)
Abstract: Modern SMT solvers, such as Z3, offer user-controllable strategies, enabling users to tailor them for their unique set of instances, thus dramatically enhancing solver performance for their use case. However, this approach of strategy customization presents a significant challenge: handcrafting an optimized strategy for a class of SMT instances remains a complex and demanding task for both solver developers and users alike.
In this paper, we address this problem of automatic SMT strategy synthesis via a novel Monte Carlo Tree Search (MCTS) based method. Our method treats strategy synthesis as a sequential decision-making process, whose search tree corresponds to the strategy space, and employs MCTS to navigate this vast search space. The key innovations that enable our method to identify effective strategies, while keeping costs low, are the ideas of layered and staged MCTS search. These novel approaches allow for a deeper and more efficient exploration of the strategy space, enabling us to synthesize more effective strategies than the default ones in state-of-the-art (SOTA) SMT solvers. We implement our method, dubbed Z3alpha, as part of the Z3 SMT solver. Through extensive evaluations across 6 important SMT logics, Z3alpha demonstrates superior performance compared to the SOTA synthesis tool FastSMT, the default Z3 solver, and the CVC5 solver on most benchmarks. Remarkably, on a challenging QF_BV benchmark set, Z3alpha solves 42.7% more instances than the default strategy in the Z3 SMT solver. - [212] arXiv:2401.17436 [ pdf , other ]
-
Title: Difficulty Modelling in Mobile Puzzle Games: An Empirical Study on Different Methods to Combine Player Analytics and Simulated DataSubjects: Artificial Intelligence (cs.AI)
Abstract: Difficulty is one of the key drivers of player engagement and it is often one of the aspects that designers tweak most to optimise the player experience; operationalising it is, therefore, a crucial task for game development studios. A common practice consists of creating metrics out of data collected by player interactions with the content; however, this allows for estimation only after the content is released and does not consider the characteristics of potential future players.
In this article, we present a number of potential solutions for the estimation of difficulty under such conditions, and we showcase the results of a comparative study intended to understand which method and which types of data perform better in different scenarios.
The results reveal that models trained on a combination of cohort statistics and simulated data produce the most accurate estimations of difficulty in all scenarios. Furthermore, among these models, artificial neural networks show the most consistent results. - [213] arXiv:2401.17511 [ pdf , other ]
-
Title: Linguistically Communicating Uncertainty in Patient-Facing Risk Prediction ModelsSubjects: Artificial Intelligence (cs.AI) ; Computation and Language (cs.CL)
Abstract: This paper addresses the unique challenges associated with uncertainty quantification in AI models when applied to patient-facing contexts within healthcare. Unlike traditional eXplainable Artificial Intelligence (XAI) methods tailored for model developers or domain experts, additional considerations of communicating in natural language, its presentation and evaluating understandability are necessary. We identify the challenges in communication model performance, confidence, reasoning and unknown knowns using natural language in the context of risk prediction. We propose a design aimed at addressing these challenges, focusing on the specific application of in-vitro fertilisation outcome prediction.
- [214] arXiv:2401.17527 [ pdf , other ]
-
Title: Learning to Stop Cut Generation for Efficient Mixed-Integer Linear ProgrammingSubjects: Artificial Intelligence (cs.AI)
Abstract: Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), as they significantly tighten the dual bounds and improve the solving performance. A key problem for cuts is when to stop cuts generation, which is important for the efficiency of solving MILPs. However, many modern MILP solvers employ hard-coded heuristics to tackle this problem, which tends to neglect underlying patterns among MILPs from certain applications. To address this challenge, we formulate the cuts generation stopping problem as a reinforcement learning problem and propose a novel hybrid graph representation model (HYGRO) to learn effective stopping strategies. An appealing feature of HYGRO is that it can effectively capture both the dynamic and static features of MILPs, enabling dynamic decision-making for the stopping strategies. To the best of our knowledge, HYGRO is the first data-driven method to tackle the cuts generation stopping problem. By integrating our approach with modern solvers, experiments demonstrate that HYGRO significantly improves the efficiency of solving MILPs compared to competitive baselines, achieving up to 31% improvement.
- [215] arXiv:2401.17710 [ pdf , other ]
-
Title: Aesthetic Preference Prediction in Interior Design: Fuzzy ApproachComments: Submitted to IEEE conference for considerationSubjects: Artificial Intelligence (cs.AI)
Abstract: Interior design is all about creating spaces that look and feel good. However, the subjective nature of aesthetic preferences presents a significant challenge in defining and quantifying what makes an interior design visually appealing. The current paper addresses this gap by introducing a novel methodology for quantifying and predicting aesthetic preferences in interior design. Our study combines fuzzy logic with image processing techniques. We collected a dataset of interior design images from social media platforms, focusing on essential visual attributes such as color harmony, lightness, and complexity. We integrate these features using weighted average to compute a general aesthetic score. Our approach considers individual color preferences in calculating the overall aesthetic preference. We initially gather user ratings for primary colors like red, brown, and others to understand their preferences. Then, we use the pixel count of the top five dominant colors in the image to get the color scheme preference. The color scheme preference and the aesthetic score are then passed as inputs to the fuzzy inference system to calculate an overall preference score. This score represents a comprehensive measure of the user's preference for a particular interior design, considering their color choices and general aesthetic appeal. We used the 2AFC (Two-Alternative Forced Choice) method to validate our methodology, achieving a notable hit rate of 0.7. This study can help designers and professionals better understand and meet people's interior design preferences, especially in a world that relies heavily on digital media.
- [216] arXiv:2401.17749 [ pdf , other ]
-
Title: SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language modelsSubjects: Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have recently garnered significant accomplishments in various exploratory tasks, even surpassing the performance of traditional reinforcement learning-based methods that have historically dominated the agent-based field. The purpose of this paper is to investigate the efficacy of LLMs in executing real-time strategy war tasks within the StarCraft II gaming environment. In this paper, we introduce SwarmBrain, an embodied agent leveraging LLM for real-time strategy implementation in the StarCraft II game environment. The SwarmBrain comprises two key components: 1) a Overmind Intelligence Matrix, powered by state-of-the-art LLMs, is designed to orchestrate macro-level strategies from a high-level perspective. This matrix emulates the overarching consciousness of the Zerg intelligence brain, synthesizing strategic foresight with the aim of allocating resources, directing expansion, and coordinating multi-pronged assaults. 2) a Swarm ReflexNet, which is agile counterpart to the calculated deliberation of the Overmind Intelligence Matrix. Due to the inherent latency in LLM reasoning, the Swarm ReflexNet employs a condition-response state machine framework, enabling expedited tactical responses for fundamental Zerg unit maneuvers. In the experimental setup, SwarmBrain is in control of the Zerg race in confrontation with an Computer-controlled Terran adversary. Experimental results show the capacity of SwarmBrain to conduct economic augmentation, territorial expansion, and tactical formulation, and it shows the SwarmBrain is capable of achieving victory against Computer players set at different difficulty levels.
- [217] arXiv:2401.17783 [ pdf , other ]
-
Title: SDRDPy: An application to graphically visualize the knowledge obtained with supervised descriptive rule algorithmsSubjects: Artificial Intelligence (cs.AI)
Abstract: SDRDPy is a desktop application that allows experts an intuitive graphic and tabular representation of the knowledge extracted by any supervised descriptive rule discovery algorithm. The application is able to provide an analysis of the data showing the relevant information of the data set and the relationship between the rules, data and the quality measures associated for each rule regardless of the tool where algorithm has been executed. All of the information is presented in a user-friendly application in order to facilitate expert analysis and also the exportation of reports in different formats.
- [218] arXiv:2401.00015 (cross-list from cs.LG) [ pdf , other ]
-
Title: Distributional Reinforcement Learning-based Energy Arbitrage Strategies in Imbalance Settlement MechanismSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Growth in the penetration of renewable energy sources makes supply more uncertain and leads to an increase in the system imbalance. This trend, together with the single imbalance pricing, opens an opportunity for balance responsible parties (BRPs) to perform energy arbitrage in the imbalance settlement mechanism. To this end, we propose a battery control framework based on distributional reinforcement learning (DRL). Our proposed control framework takes a risk-sensitive perspective, allowing BRPs to adjust their risk preferences: we aim to optimize a weighted sum of the arbitrage profit and a risk measure while constraining the daily number of cycles for the battery. We assess the performance of our proposed control framework using the Belgian imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q learning and soft actor-critic. Results reveal that the distributional soft actor-critic method can outperform other methods. Moreover, we note that our fully risk-averse agent appropriately learns to hedge against the risk related to the unknown imbalance price by (dis)charging the battery only when the agent is more certain about the price.
- [219] arXiv:2401.00031 (cross-list from cs.LG) [ pdf , other ]
-
Title: Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and ChallengesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Decision-making is a dynamic process requiring perception, memory, and reasoning to make choices and find optimal policies. Traditional approaches to decision-making suffer from sample efficiency and generalization, while large-scale self-supervised pretraining has enabled fast adaptation with fine-tuning or few-shot learning in language and vision. We thus argue to integrate knowledge acquired from generic large-scale self-supervised pretraining into downstream decision-making problems. We propose Pretrain-Then-Adapt pipeline and survey recent work on data collection, pretraining objectives and adaptation strategies for decision-making pretraining and downstream inference. Finally, we identify critical challenges and future directions for developing decision foundation model with the help of generic and flexible self-supervised pretraining.
- [220] arXiv:2401.00104 (cross-list from cs.LG) [ pdf , other ]
-
Title: Causal State Distillation for Explainable Reinforcement LearningAuthors: Wenhao Lu , Xufeng Zhao , Thilo Fryen , Jae Hee Lee , Mengdi Li , Sven Magg , Stefan WermterComments: this https URL ; Accepted as oral by CLeaR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: Reinforcement learning (RL) is a powerful technique for training intelligent agents, but understanding why these agents make specific decisions can be quite challenging. This lack of transparency in RL models has been a long-standing problem, making it difficult for users to grasp the reasons behind an agent's behaviour. Various approaches have been explored to address this problem, with one promising avenue being reward decomposition (RD). RD is appealing as it sidesteps some of the concerns associated with other methods that attempt to rationalize an agent's behaviour in a post-hoc manner. RD works by exposing various facets of the rewards that contribute to the agent's objectives during training. However, RD alone has limitations as it primarily offers insights based on sub-rewards and does not delve into the intricate cause-and-effect relationships that occur within an RL agent's neural model. In this paper, we present an extension of RD that goes beyond sub-rewards to provide more informative explanations. Our approach is centred on a causal learning framework that leverages information-theoretic measures for explanation objectives that encourage three crucial properties of causal factors: causal sufficiency, sparseness, and orthogonality. These properties help us distill the cause-and-effect relationships between the agent's states and actions or rewards, allowing for a deeper understanding of its decision-making processes. Our framework is designed to generate local explanations and can be applied to a wide range of RL tasks with multiple reward channels. Through a series of experiments, we demonstrate that our approach offers more meaningful and insightful explanations for the agent's action selections.
- [221] arXiv:2401.00110 (cross-list from cs.CV) [ pdf , other ]
-
Title: Diffusion Model with Perceptual LossSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Diffusion models trained with mean squared error loss tend to generate unrealistic samples. Current state-of-the-art models rely on classifier-free guidance to improve sample quality, yet its surprising effectiveness is not fully understood. In this paper, we show that the effectiveness of classifier-free guidance partly originates from it being a form of implicit perceptual guidance. As a result, we can directly incorporate perceptual loss in diffusion training to improve sample quality. Since the score matching objective used in diffusion training strongly resembles the denoising autoencoder objective used in unsupervised training of perceptual networks, the diffusion model itself is a perceptual network and can be used to generate meaningful perceptual loss. We propose a novel self-perceptual objective that results in diffusion models capable of generating more realistic samples. For conditional generation, our method only improves sample quality without entanglement with the conditional input and therefore does not sacrifice sample diversity. Our method can also improve sample quality for unconditional generation, which was not possible with classifier-free guidance before.
- [222] arXiv:2401.00132 (cross-list from cs.MA) [ pdf , other ]
-
Title: Contrastive learning-based agent modeling for deep reinforcement learningComments: 8 pages, 6 figuresSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: Multi-agent systems often require agents to collaborate with or compete against other agents with diverse goals, behaviors, or strategies. Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems, as this is the means by which the ego agent understands other agents' behavior and extracts their meaningful policy representations. These representations can be used to enhance the ego agent's adaptive policy which is trained by reinforcement learning. However, existing agent modeling approaches typically assume the availability of local observations from other agents (modeled agents) during training or a long observation trajectory for policy adaption. To remove these constrictive assumptions and improve agent modeling performance, we devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. With these observations, CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode. We evaluated the efficacy of our approach in both cooperative and competitive multi-agent environments. Our experiments demonstrate that our approach achieves state-of-the-art on both cooperative and competitive tasks, highlighting the potential of contrastive learning-based agent modeling for enhancing reinforcement learning.
- [223] arXiv:2401.00158 (cross-list from cs.CL) [ pdf , other ]
-
Title: ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained Language Models for Question Answering over Knowledge GraphComments: EMNLP-23-Main; simple but effective SOTA on CWQ under a weak-supervised settingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Question Answering over Knowledge Graph (KGQA) aims to seek answer entities for the natural language question from a large-scale Knowledge Graph~(KG). To better perform reasoning on KG, recent work typically adopts a pre-trained language model~(PLM) to model the question, and a graph neural network~(GNN) based module to perform multi-hop reasoning on the KG. Despite the effectiveness, due to the divergence in model architecture, the PLM and GNN are not closely integrated, limiting the knowledge sharing and fine-grained feature interactions. To solve it, we aim to simplify the above two-module approach, and develop a more capable PLM that can directly support subgraph reasoning for KGQA, namely ReasoningLM. In our approach, we propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning, and also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions. After adaptation, the PLM can be parameter-efficient fine-tuned on downstream tasks. Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data. Our codes and data are publicly available at~\url{ this https URL }.
- [224] arXiv:2401.00161 (cross-list from cs.LG) [ pdf , other ]
-
Title: DiffHybrid-UQ: Uncertainty Quantification for Differentiable Hybrid Neural ModelingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The hybrid neural differentiable models mark a significant advancement in the field of scientific machine learning. These models, integrating numerical representations of known physics into deep neural networks, offer enhanced predictive capabilities and show great potential for data-driven modeling of complex physical systems. However, a critical and yet unaddressed challenge lies in the quantification of inherent uncertainties stemming from multiple sources. Addressing this gap, we introduce a novel method, DiffHybrid-UQ, for effective and efficient uncertainty propagation and estimation in hybrid neural differentiable models, leveraging the strengths of deep ensemble Bayesian learning and nonlinear transformations. Specifically, our approach effectively discerns and quantifies both aleatoric uncertainties, arising from data noise, and epistemic uncertainties, resulting from model-form discrepancies and data sparsity. This is achieved within a Bayesian model averaging framework, where aleatoric uncertainties are modeled through hybrid neural models. The unscented transformation plays a pivotal role in enabling the flow of these uncertainties through the nonlinear functions within the hybrid model. In contrast, epistemic uncertainties are estimated using an ensemble of stochastic gradient descent (SGD) trajectories. This approach offers a practical approximation to the posterior distribution of both the network parameters and the physical parameters. Notably, the DiffHybrid-UQ framework is designed for simplicity in implementation and high scalability, making it suitable for parallel computing environments. The merits of the proposed method have been demonstrated through problems governed by both ordinary and partial differentiable equations.
- [225] arXiv:2401.00209 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: AI and Tempo Estimation: A ReviewAuthors: Geoff LuckComments: 9 pagesSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: The author's goal in this paper is to explore how artificial intelligence (AI) has been utilised to inform our understanding of and ability to estimate at scale a critical aspect of musical creativity - musical tempo. The central importance of tempo to musical creativity can be seen in how it is used to express specific emotions (Eerola and Vuoskoski 2013), suggest particular musical styles (Li and Chan 2011), influence perception of expression (Webster and Weir 2005) and mediate the urge to move one's body in time to the music (Burger et al. 2014). Traditional tempo estimation methods typically detect signal periodicities that reflect the underlying rhythmic structure of the music, often using some form of autocorrelation of the amplitude envelope (Lartillot and Toiviainen 2007). Recently, AI-based methods utilising convolutional or recurrent neural networks (CNNs, RNNs) on spectral representations of the audio signal have enjoyed significant improvements in accuracy (Aarabi and Peeters 2022). Common AI-based techniques include those based on probability (e.g., Bayesian approaches, hidden Markov models (HMM)), classification and statistical learning (e.g., support vector machines (SVM)), and artificial neural networks (ANNs) (e.g., self-organising maps (SOMs), CNNs, RNNs, deep learning (DL)). The aim here is to provide an overview of some of the more common AI-based tempo estimation algorithms and to shine a light on notable benefits and potential drawbacks of each. Limitations of AI in this field in general are also considered, as is the capacity for such methods to account for idiosyncrasies inherent in tempo perception, i.e., how well AI-based approaches are able to think and act like humans.
- [226] arXiv:2401.00230 (cross-list from cs.LG) [ pdf , other ]
-
Title: Transformer Multivariate Forecasting: Less is More?Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In the domain of multivariate forecasting, transformer models stand out as powerful apparatus, displaying exceptional capabilities in handling messy datasets from real-world contexts. However, the inherent complexity of these datasets, characterized by numerous variables and lengthy temporal sequences, poses challenges, including increased noise and extended model runtime. This paper focuses on reducing redundant information to elevate forecasting accuracy while optimizing runtime efficiency. We propose a novel transformer forecasting framework enhanced by Principal Component Analysis (PCA) to tackle this challenge. The framework is evaluated by five state-of-the-art (SOTA) models and four diverse real-world datasets. Our experimental results demonstrate the framework's ability to minimize prediction errors across all models and datasets while significantly reducing runtime. From the model perspective, one of the PCA-enhanced models: PCA+Crossformer, reduces mean square errors (MSE) by 33.3% and decreases runtime by 49.2% on average. From the dataset perspective, the framework delivers 14.3% MSE and 76.6% runtime reduction on Electricity datasets, as well as 4.8% MSE and 86.9% runtime reduction on Traffic datasets. This study aims to advance various SOTA models and enhance transformer-based time series forecasting for intricate data. Code is available at: this https URL .
- [227] arXiv:2401.00238 (cross-list from cs.CL) [ pdf , other ]
-
Title: How to Evaluate Coreference in Literary Texts?Comments: Presented as a poster at the conference CHR2023 (non archival)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this short paper, we examine the main metrics used to evaluate textual coreference and we detail some of their limitations. We show that a unique score cannot represent the full complexity of the problem at stake, and is thus uninformative, or even misleading. We propose a new way of evaluating coreference, taking into account the context (in our case, the analysis of fictions, esp. novels). More specifically, we propose to distinguish long coreference chains (corresponding to main characters), from short ones (corresponding to secondary characters), and singletons (isolated elements). This way, we hope to get more interpretable and thus more informative results through evaluation.
- [228] arXiv:2401.00248 (cross-list from cs.CV) [ pdf , other ]
-
Title: Promoting Segment Anything Model towards Highly Accurate Dichotomous Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The Segment Anything Model (SAM) represents a significant breakthrough into foundation models for computer vision, providing a large-scale image segmentation model. However, despite SAM's zero-shot performance, its segmentation masks lack fine-grained details, particularly in accurately delineating object boundaries. We have high expectations regarding whether SAM, as a foundation model, can be improved towards highly accurate object segmentation, which is known as dichotomous image segmentation (DIS). To address this issue, we propose DIS-SAM, which advances SAM towards DIS with extremely accurate details. DIS-SAM is a framework specifically tailored for highly accurate segmentation, maintaining SAM's promptable design. DIS-SAM employs a two-stage approach, integrating SAM with a modified IS-Net dedicated to DIS. Despite its simplicity, DIS-SAM demonstrates significantly enhanced segmentation accuracy compared to SAM and HQ-SAM.
- [229] arXiv:2401.00285 (cross-list from cs.CV) [ pdf , other ]
-
Title: BusReF: Infrared-Visible images registration and fusion focus on reconstructible area using one set of featuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In a scenario where multi-modal cameras are operating together, the problem of working with non-aligned images cannot be avoided. Yet, existing image fusion algorithms rely heavily on strictly registered input image pairs to produce more precise fusion results, as a way to improve the performance of downstream high-level vision tasks. In order to relax this assumption, one can attempt to register images first. However, the existing methods for registering multiple modalities have limitations, such as complex structures and reliance on significant semantic information. This paper aims to address the problem of image registration and fusion in a single framework, called BusRef. We focus on Infrared-Visible image registration and fusion task (IVRF). In this framework, the input unaligned image pairs will pass through three stages: Coarse registration, Fine registration and Fusion. It will be shown that the unified approach enables more robust IVRF. We also propose a novel training and evaluation strategy, involving the use of masks to reduce the influence of non-reconstructible regions on the loss functions, which greatly improves the accuracy and robustness of the fusion task. Last but not least, a gradient-aware fusion network is designed to preserve the complementary information. The advanced performance of this algorithm is demonstrated by
- [230] arXiv:2401.00288 (cross-list from cs.SE) [ pdf , other ]
-
Title: Deep Learning for Code Intelligence: Survey, Benchmark and ToolkitAuthors: Yao Wan , Yang He , Zhangqian Bi , Jianguo Zhang , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin , Philip S. YuSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models ( this https URL ). At last, we also point out several challenging and promising directions for future research.
- [231] arXiv:2401.00290 (cross-list from cs.CL) [ pdf , other ]
-
Title: Red Teaming for Large Language Models At Scale: Tackling Hallucinations on Mathematics TasksAuthors: Aleksander Buszydlik , Karol Dobiczek , Michał Teodor Okoń , Konrad Skublicki , Philip Lippmann , Jie YangComments: Accepted to The ART of Safety: Workshop on Adversarial testing and Red-Teaming for generative AI (IJCNLP-AACL 2023)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We consider the problem of red teaming LLMs on elementary calculations and algebraic tasks to evaluate how various prompting techniques affect the quality of outputs. We present a framework to procedurally generate numerical questions and puzzles, and compare the results with and without the application of several red teaming techniques. Our findings suggest that even though structured reasoning and providing worked-out examples slow down the deterioration of the quality of answers, the gpt-3.5-turbo and gpt-4 models are not well suited for elementary calculations and reasoning tasks, also when being red teamed.
- [232] arXiv:2401.00330 (cross-list from cs.LG) [ pdf , other ]
-
Title: Efficient Two-Phase Offline Deep Reinforcement Learning from Preference FeedbackSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this work, we consider the offline preference-based reinforcement learning problem. We focus on the two-phase learning approach that is prevalent in previous reinforcement learning from human preference works. We find a challenge in applying two-phase learning in the offline PBRL setting that the learned utility model can be too hard for the learning agent to optimize during the second learning phase. To overcome the challenge, we propose a two-phasing learning approach under behavior regularization through action clipping. The insight is that the state-actions which are poorly covered by the dataset can only provide limited information and increase the complexity of the problem in the second learning phase. Our method ignores such state-actions during the second learning phase to achieve higher learning efficiency. We empirically verify that our method has high learning efficiency on a variety of datasets in robotic control environments.
- [233] arXiv:2401.00365 (cross-list from cs.LG) [ pdf , other ]
-
Title: HQ-VAE: Hierarchical Discrete Representation Learning with Variational BayesAuthors: Yuhta Takida , Yukara Ikemiya , Takashi Shibuya , Kazuki Shimada , Woosung Choi , Chieh-Hsin Lai , Naoki Murata , Toshimitsu Uesaka , Kengo Uchida , Wei-Hsiang Liao , Yuki MitsufujiComments: 34 pages with 17 figures, accepted for TMLRSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the codebook is not efficiently used to express the data, and hence degrades reconstruction accuracy. To mitigate this problem, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE), and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validated HQ-VAE in terms of its applicability to a different modality with an audio dataset.
- [234] arXiv:2401.00379 (cross-list from cs.SE) [ pdf , other ]
-
Title: DREAM: Debugging and Repairing AutoML PipelinesComments: 12 pages, 10 figuresSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model architecture and hyperparameters for a given task. Like other software systems, existing AutoML systems suffer from bugs. We identify two common and severe bugs in AutoML, performance bug (i.e., searching for the desired model takes an unreasonably long time) and ineffective search bug (i.e., AutoML systems are not able to find an accurate enough model). After analyzing the workflow of AutoML, we observe that existing AutoML systems overlook potential opportunities in search space, search method, and search feedback, which results in performance and ineffective search bugs. Based on our analysis, we design and implement DREAM, an automatic debugging and repairing system for AutoML systems. It monitors the process of AutoML to collect detailed feedback and automatically repairs bugs by expanding search space and leveraging a feedback-driven search strategy. Our evaluation results show that DREAM can effectively and efficiently repair AutoML bugs.
- [235] arXiv:2401.00390 (cross-list from cs.CV) [ pdf , other ]
-
Title: Horizontal Federated Computer VisionComments: 11 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: In the modern world, the amount of visual data recorded has been rapidly increasing. In many cases, data is stored in geographically distinct locations and thus requires a large amount of time and space to consolidate. Sometimes, there are also regulations for privacy protection which prevent data consolidation. In this work, we present federated implementations for object detection and recognition using a federated Faster R-CNN (FRCNN) and image segmentation using a federated Fully Convolutional Network (FCN). Our FRCNN was trained on 5000 examples of the COCO2017 dataset while our FCN was trained on the entire train set of the CamVid dataset. The proposed federated models address the challenges posed by the increasing volume and decentralized nature of visual data, offering efficient solutions in compliance with privacy regulations.
- [236] arXiv:2401.00391 (cross-list from cs.RO) [ pdf , other ]
-
Title: Controllable Safety-Critical Closed-loop Traffic Simulation via Guided DiffusionComments: Submitted to CVPR 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Evaluating the performance of autonomous vehicle planning algorithms necessitates simulating long-tail traffic scenarios. Traditional methods for generating safety-critical scenarios often fall short in realism and controllability. Furthermore, these techniques generally neglect the dynamics of agent interactions. To mitigate these limitations, we introduce a novel closed-loop simulation framework rooted in guided diffusion models. Our approach yields two distinct advantages: 1) the generation of realistic long-tail scenarios that closely emulate real-world conditions, and 2) enhanced controllability, enabling more comprehensive and interactive evaluations. We achieve this through novel guidance objectives that enhance road progress while lowering collision and off-road rates. We develop a novel approach to simulate safety-critical scenarios through an adversarial term in the denoising process, which allows the adversarial agent to challenge a planner with plausible maneuvers, while all agents in the scene exhibit reactive and realistic behaviors. We validate our framework empirically using the NuScenes dataset, demonstrating improvements in both realism and controllability. These findings affirm that guided diffusion models provide a robust and versatile foundation for safety-critical, interactive traffic simulation, extending their utility across the broader landscape of autonomous driving. For additional resources and demonstrations, visit our project page at this https URL .
- [237] arXiv:2401.00393 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Generative Model-Driven Synthetic Training Image Generation: An Approach to Cognition in Rail Defect DetectionAuthors: Rahatara Ferdousi , Chunsheng Yang , M. Anwar Hossain , Fedwa Laamarti , M. Shamim Hossain , Abdulmotaleb El SaddikComments: 26 pages, 13 figures, Springer JournalSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Image and Video Processing (eess.IV)
Abstract: Recent advancements in cognitive computing, with the integration of deep learning techniques, have facilitated the development of intelligent cognitive systems (ICS). This is particularly beneficial in the context of rail defect detection, where the ICS would emulate human-like analysis of image data for defect patterns. Despite the success of Convolutional Neural Networks (CNN) in visual defect classification, the scarcity of large datasets for rail defect detection remains a challenge due to infrequent accident events that would result in defective parts and images. Contemporary researchers have addressed this data scarcity challenge by exploring rule-based and generative data augmentation models. Among these, Variational Autoencoder (VAE) models can generate realistic data without extensive baseline datasets for noise modeling. This study proposes a VAE-based synthetic image generation technique for rail defects, incorporating weight decay regularization and image reconstruction loss to prevent overfitting. The proposed method is applied to create a synthetic dataset for the Canadian Pacific Railway (CPR) with just 50 real samples across five classes. Remarkably, 500 synthetic samples are generated with a minimal reconstruction loss of 0.021. A Visual Transformer (ViT) model underwent fine-tuning using this synthetic CPR dataset, achieving high accuracy rates (98%-99%) in classifying the five defect classes. This research offers a promising solution to the data scarcity challenge in rail defect detection, showcasing the potential for robust ICS development in this domain.
- [238] arXiv:2401.00409 (cross-list from cs.CV) [ pdf , other ]
-
Title: A Two-stream Hybrid CNN-Transformer Network for Skeleton-based Human Interaction RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Human Interaction Recognition is the process of identifying interactive actions between multiple participants in a specific situation. The aim is to recognise the action interactions between multiple entities and their meaning. Many single Convolutional Neural Network has issues, such as the inability to capture global instance interaction features or difficulty in training, leading to ambiguity in action semantics. In addition, the computational complexity of the Transformer cannot be ignored, and its ability to capture local information and motion features in the image is poor. In this work, we propose a Two-stream Hybrid CNN-Transformer Network (THCT-Net), which exploits the local specificity of CNN and models global dependencies through the Transformer. CNN and Transformer simultaneously model the entity, time and space relationships between interactive entities respectively. Specifically, Transformer-based stream integrates 3D convolutions with multi-head self-attention to learn inter-token correlations; We propose a new multi-branch CNN framework for CNN-based streams that automatically learns joint spatio-temporal features from skeleton sequences. The convolutional layer independently learns the local features of each joint neighborhood and aggregates the features of all joints. And the raw skeleton coordinates as well as their temporal difference are integrated with a dual-branch paradigm to fuse the motion features of the skeleton. Besides, a residual structure is added to speed up training convergence. Finally, the recognition results of the two branches are fused using parallel splicing. Experimental results on diverse and challenging datasets, demonstrate that the proposed method can better comprehend and infer the meaning and context of various actions, outperforming state-of-the-art methods.
- [239] arXiv:2401.00420 (cross-list from cs.CV) [ pdf , other ]
-
Title: SynCDR : Training Cross Domain Retrieval Models with Synthetic DataComments: Pre-printSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains. For instance, given a sketch of an object, a model needs to retrieve a real image of it from an online store's catalog. A standard approach for such a problem is learning a feature space of images where Euclidean distances reflect similarity. Even without human annotations, which may be expensive to acquire, prior methods function reasonably well using unlabeled images for training. Our problem constraint takes this further to scenarios where the two domains do not necessarily share any common categories in training data. This can occur when the two domains in question come from different versions of some biometric sensor recording identities of different people. We posit a simple solution, which is to generate synthetic data to fill in these missing category examples across domains. This, we do via category preserving translation of images from one visual domain to another. We compare approaches specifically trained for this translation for a pair of domains, as well as those that can use large-scale pre-trained text-to-image diffusion models via prompts, and find that the latter can generate better replacement synthetic data, leading to more accurate cross-domain retrieval models. Our best SynCDR model can outperform prior art by up to 15\%. Code for our work is available at this https URL .
- [240] arXiv:2401.00426 (cross-list from cs.CL) [ pdf , other ]
-
Title: keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLMAuthors: Chaojie Wang , Yishi Xu , Zhong Peng , Chenxi Zhang , Bo Chen , Xinrun Wang , Lei Feng , Bo AnComments: 12 pages, 6 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have exhibited remarkable performance on various natural language processing (NLP) tasks, especially for question answering. However, in the face of problems beyond the scope of knowledge, these LLMs tend to talk nonsense with a straight face, where the potential solution could be incorporating an Information Retrieval (IR) module and generating response based on these retrieved knowledge. In this paper, we present a novel framework to assist LLMs, such as ChatGPT, to retrieve question-related structured information on the knowledge graph, and demonstrate that Knowledge-based question answering (Keqing) could be a nature Chain-of-Thought (CoT) mentor to guide the LLM to sequentially find the answer entities of a complex question through interpretable logical chains. Specifically, the workflow of Keqing will execute decomposing a complex question according to predefined templates, retrieving candidate entities on knowledge graph, reasoning answers of sub-questions, and finally generating response with reasoning paths, which greatly improves the reliability of LLM's response. The experimental results on KBQA datasets show that Keqing can achieve competitive performance and illustrate the logic of answering each question.
- [241] arXiv:2401.00435 (cross-list from cs.CV) [ pdf , other ]
-
Title: Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional training methods are primarily designed for string decoders and cannot adequately generalize to tree decoders, which offer superior generalization capabilities and structural analysis capacity. In order to overcome these limitations, we propose the Mirror-Flipped Symbol Layout Tree (MF-SLT) and Bidirectional Asynchronous Training (BAT) structure. Our method extends the bidirectional training strategy to the tree decoder, allowing for more effective training by leveraging bidirectional information. Additionally, we analyze the impact of the visual and linguistic perception of the HMER model separately and introduce the Shared Language Modeling (SLM) mechanism. Through the SLM, we enhance the model's robustness and generalization when dealing with visual ambiguity, particularly in scenarios with abundant training data. Our approach has been validated through extensive experiments, demonstrating its ability to achieve new state-of-the-art results on the CROHME 2014, 2016, and 2019 datasets, as well as the HME100K dataset. The code used in our experiments will be publicly available.
- [242] arXiv:2401.00477 (cross-list from cs.IT) [ pdf , other ]
-
Title: Coding for Gaussian Two-Way Channels: Linear and Learning-Based ApproachesAuthors: Junghoon Kim , Taejoon Kim , Anindya Bijoy Das , Seyyedali Hosseinalipour , David J. Love , Christopher G. BrintonComments: This work has been submitted to the IEEE Transactions on Information TheorySubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Although user cooperation cannot improve the capacity of Gaussian two-way channels (GTWCs) with independent noises, it can improve communication reliability. In this work, we aim to enhance and balance the communication reliability in GTWCs by minimizing the sum of error probabilities via joint design of encoders and decoders at the users. We first formulate general encoding/decoding functions, where the user cooperation is captured by the coupling of user encoding processes. The coupling effect renders the encoder/decoder design non-trivial, requiring effective decoding to capture this effect, as well as efficient power management at the encoders within power constraints. To address these challenges, we propose two different two-way coding strategies: linear coding and learning-based coding. For linear coding, we propose optimal linear decoding and discuss new insights on encoding regarding user cooperation to balance reliability. We then propose an efficient algorithm for joint encoder/decoder design. For learning-based coding, we introduce a novel recurrent neural network (RNN)-based coding architecture, where we propose interactive RNNs and a power control layer for encoding, and we incorporate bi-directional RNNs with an attention mechanism for decoding. Through simulations, we show that our two-way coding methodologies outperform conventional channel coding schemes (that do not utilize user cooperation) significantly in sum-error performance. We also demonstrate that our linear coding excels at high signal-to-noise ratios (SNRs), while our RNN-based coding performs best at low SNRs. We further investigate our two-way coding strategies in terms of power distribution, two-way coding benefit, different coding rates, and block-length gain.
- [243] arXiv:2401.00496 (cross-list from cs.CV) [ pdf , other ]
-
Title: SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy ChallengeAuthors: Dimitrios Psychogyios , Emanuele Colleoni , Beatrice Van Amsterdam , Chih-Yang Li , Shu-Yu Huang , Yuchong Li , Fucang Jia , Baosheng Zou , Guotai Wang , Yang Liu , Maxence Boels , Jiayu Huo , Rachel Sparks , Prokar Dasgupta , Alejandro Granados , Sebastien Ourselin , Mengya Xu , An Wang , Yanan Wu , Long Bai , Hongliang Ren , Atsushi Yamada , Yuriko Harai , Yuto Ishikawa , Kazuyuki Hayashi , Jente Simoens , Pieter DeBacker , Francesco Cisternino , Gabriele Furnari , Alex Mottrie , Federica Ferraguti , Satoshi Kondo , Satoshi Kasai , Kousuke Hirasawa , Soohee Kim , Seung Hyun Lee , Kyu Eun Lee , Hyoun-Joong Kong , Kui Fu , Chao Li , Shan An , Stefanie Krell , Sebastian Bodenstedt , Nicolas Ayobi , Alejandra Perez , Santiago Rodriguez , Juanita Puentes , Pablo Arbelaez , Omid Mohareri , Danail StoyanovSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: this https URL
- [244] arXiv:2401.00521 (cross-list from cs.LG) [ pdf , other ]
-
Title: Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis DataSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Applications (stat.AP)
Abstract: Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is often overlooked or inadequately considered. To address these limitations, we present a novel Multi-spatial Multi-temporal air quality forecasting method based on Graph Convolutional Networks and Gated Recurrent Units (M2G2), bridging the gap in air quality forecasting across spatial and temporal scales. The proposed framework consists of two modules: Multi-scale Spatial GCN (MS-GCN) for spatial information fusion and Multi-scale Temporal GRU(MT-GRU) for temporal information integration. In the spatial dimension, the MS-GCN module employs a bidirectional learnable structure and a residual structure, enabling comprehensive information exchange between individual monitoring stations and the city-scale graph. Regarding the temporal dimension, the MT-GRU module adaptively combines information from different temporal scales through parallel hidden states. Leveraging meteorological indicators and four air quality indicators, we present comprehensive comparative analyses and ablation experiments, showcasing the higher accuracy of M2G2 in comparison to nine currently available advanced approaches across all aspects. The improvements of M2G2 over the second-best method on RMSE of the 24h/48h/72h are as follows: PM2.5: (7.72%, 6.67%, 10.45%); PM10: (6.43%, 5.68%, 7.73%); NO2: (5.07%, 7.76%, 16.60%); O3: (6.46%, 6.86%, 9.79%). Furthermore, we demonstrate the effectiveness of each module of M2G2 by ablation study.
- [245] arXiv:2401.00525 (cross-list from cs.SI) [ pdf , ps , other ]
-
Title: Pack and Measure: An Effective Approach for Influence Propagation in Social NetworksSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract: The Influence Maximization problem under the Independent Cascade model (IC) is considered. The problem asks for a minimal set of vertices to serve as "seed set" from which a maximum influence propagation is expected. New seed-set selection methods are introduced based on the notions of a $d$-packing and vertex centrality. In particular, we focus on selecting seed-vertices that are far apart and whose influence-values are the highest in their local communities. Our best results are achieved via an initial computation of a $d$-Packing followed by selecting either vertices of high degree or high centrality in their respective closed neighborhoods. This overall "Pack and Measure" approach proves highly effective as a seed selection method.
- [246] arXiv:2401.00529 (cross-list from cs.LG) [ pdf , other ]
-
Title: GraphGPT: Graph Learning with Generative Pre-trained TransformersComments: 9 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We introduce \textit{GraphGPT}, a novel model for Graph learning by self-supervised Generative Pre-training Transformers. Our model transforms each graph or sampled subgraph into a sequence of tokens representing the node, edge and attributes reversibly using the Eulerian path first. Then we feed the tokens into a standard transformer decoder and pre-train it with the next-token-prediction (NTP) task. Lastly, we fine-tune the GraphGPT model with the supervised tasks. This intuitive, yet effective model achieves superior or close results to the state-of-the-art methods for the graph-, edge- and node-level tasks on the large scale molecular dataset PCQM4Mv2, the protein-protein association dataset ogbl-ppa and the ogbn-proteins dataset from the Open Graph Benchmark (OGB). Furthermore, the generative pre-training enables us to train GraphGPT up to 400M+ parameters with consistently increasing performance, which is beyond the capability of GNNs and previous graph transformers. The source code and pre-trained checkpoints will be released soon\footnote{\url{ this https URL }} to pave the way for the graph foundation model research, and also to assist the scientific discovery in pharmaceutical, chemistry, material and bio-informatics domains, etc.
- [247] arXiv:2401.00532 (cross-list from cs.LG) [ pdf , other ]
-
Title: On the Necessity of Metalearning: Learning Suitable Parameterizations for Learning ProcessesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper we will discuss metalearning and how we can go beyond the current classical learning paradigm. We will first address the importance of inductive biases in the learning process and what is at stake: the quantities of data necessary to learn. We will subsequently see the importance of choosing suitable parameterizations to end up with well-defined learning processes. Especially since in the context of real-world applications, we face numerous biases due, e.g., to the specificities of sensors, the heterogeneity of data sources, the multiplicity of points of view, etc. This will lead us to the idea of exploiting the structuring of the concepts to be learned in order to organize the learning process that we published previously. We conclude by discussing the perspectives around parameter-tying schemes and the emergence of universal aspects in the models thus learned.
- [248] arXiv:2401.00536 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Multi-Task, Multi-Modal Approach for Predicting Categorical and Dimensional EmotionsComments: Companion Publication of the 25th International Conference on Multimodal Interaction (pp. 311-317)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Speech emotion recognition (SER) has received a great deal of attention in recent years in the context of spontaneous conversations. While there have been notable results on datasets like the well known corpus of naturalistic dyadic conversations, IEMOCAP, for both the case of categorical and dimensional emotions, there are few papers which try to predict both paradigms at the same time. Therefore, in this work, we aim to highlight the performance contribution of multi-task learning by proposing a multi-task, multi-modal system that predicts categorical and dimensional emotions. The results emphasise the importance of cross-regularisation between the two types of emotions. Our approach consists of a multi-task, multi-modal architecture that uses parallel feature refinement through self-attention for the feature of each modality. In order to fuse the features, our model introduces a set of learnable bridge tokens that merge the acoustic and linguistic features with the help of cross-attention. Our experiments for categorical emotions on 10-fold validation yield results comparable to the current state-of-the-art. In our configuration, our multi-task approach provides better results compared to learning each paradigm separately. On top of that, our best performing model achieves a high result for valence compared to the previous multi-task experiments.
- [249] arXiv:2401.00563 (cross-list from cs.CR) [ pdf , other ]
-
Title: KernelGPT: Enhanced Kernel Fuzzing via Large Language ModelsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: Bugs in operating system kernels can affect billions of devices and users all over the world. As a result, a large body of research has been focused on kernel fuzzing, i.e., automatically generating syscall (system call) sequences to detect potential kernel bugs or vulnerabilities. Syzkaller, one of the most widely studied kernel fuzzers, aims to generate valid syscall sequences based on predefined specifications written in syzlang, a domain-specific language for defining syscalls, their arguments, and the relationships between them. While there has been existing work trying to automate Syzkaller specification generation, this still remains largely manual work and a large number of important syscalls are still uncovered. In this paper, we propose KernelGPT, the first approach to automatically inferring Syzkaller specifications via Large Language Models (LLMs) for enhanced kernel fuzzing. Our basic insight is that LLMs have seen massive kernel code, documentation, and use cases during pre-training, and thus can automatically distill the necessary information for making valid syscalls. More specifically, KernelGPT leverages an iterative approach to automatically infer all the necessary specification components, and further leverages the validation feedback to repair/refine the initial specifications. Our preliminary results demonstrate that KernelGPT can help Syzkaller achieve higher coverage and find multiple previously unknown bugs. Moreover, we also received a request from the Syzkaller team to upstream specifications inferred by KernelGPT.
- [250] arXiv:2401.00579 (cross-list from cs.CL) [ pdf , other ]
-
Title: Exploring the Effectiveness of Instruction Tuning in Biomedical Language ProcessingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs), particularly those similar to ChatGPT, have significantly influenced the field of Natural Language Processing (NLP). While these models excel in general language tasks, their performance in domain-specific downstream tasks such as biomedical and clinical Named Entity Recognition (NER), Relation Extraction (RE), and Medical Natural Language Inference (NLI) is still evolving. In this context, our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately $200,000$ instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.
- [251] arXiv:2401.00608 (cross-list from cs.CV) [ pdf , other ]
-
Title: Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge GraphsAuthors: Vardaan Pahuja , Weidi Luo , Yu Gu , Cheng-Hao Tu , Hong-You Chen , Tanya Berger-Wolf , Charles Stewart , Song Gao , Wei-Lun Chao , Yu SuComments: 13 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Camera traps are valuable tools in animal ecology for biodiversity monitoring and conservation. However, challenges like poor generalization to deployment at new unseen locations limit their practical application. Images are naturally associated with heterogeneous forms of context possibly in different modalities. In this work, we leverage the structured context associated with the camera trap images to improve out-of-distribution generalization for the task of species identification in camera traps. For example, a photo of a wild animal may be associated with information about where and when it was taken, as well as structured biology knowledge about the animal species. While typically overlooked by existing work, bringing back such context offers several potential benefits for better image understanding, such as addressing data scarcity and enhancing generalization. However, effectively integrating such heterogeneous context into the visual domain is a challenging problem. To address this, we propose a novel framework that reformulates species classification as link prediction in a multimodal knowledge graph (KG). This framework seamlessly integrates various forms of multimodal context for visual recognition. We apply this framework for out-of-distribution species classification on the iWildCam2020-WILDS and Snapshot Mountain Zebra datasets and achieve competitive performance with state-of-the-art approaches. Furthermore, our framework successfully incorporates biological taxonomy for improved generalization and enhances sample efficiency for recognizing under-represented species.
- [252] arXiv:2401.00609 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Survey of Personality, Persona, and Profile in Conversational Agents and ChatbotsAuthors: Richard SutcliffeComments: 25 pages, 6 tables, 207 referencesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We present a review of personality in neural conversational agents (CAs), also called chatbots. First, we define Personality, Persona, and Profile. We explain all personality schemes which have been used in CAs, and list models under the scheme(s) which they use. Second we describe 21 datasets which have been developed in recent CA personality research. Third, we define the methods used to embody personality in a CA, and review recent models using them. Fourth, we survey some relevant reviews on CAs, personality, and related topics. Finally, we draw conclusions and identify some research challenges for this important emerging field.
- [253] arXiv:2401.00631 (cross-list from cs.NI) [ pdf , other ]
-
Title: Coordinated Deep Neural Networks: A Versatile Edge Offloading AlgorithmSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: As artificial intelligence (AI) applications continue to expand, there is a growing need for deep neural network (DNN) models. Although DNN models deployed at the edge are promising to provide AI as a service with low latency, their cooperation is yet to be explored. In this paper, we consider the DNN service providers share their computing resources as well as their models' parameters and allow other DNNs to offload their computations without mirroring. We propose a novel algorithm called coordinated DNNs on edge (\textbf{CoDE}) that facilitates coordination among DNN services by creating multi-task DNNs out of individual models. CoDE aims to find the optimal path that results in the lowest possible cost, where the cost reflects the inference delay, model accuracy, and local computation workload. With CoDE, DNN models can make new paths for inference by using their own or other models' parameters. We then evaluate the performance of CoDE through numerical experiments. The results demonstrate a $75\%$ reduction in the local service computation workload while degrading the accuracy by only $2\%$ and having the same inference time in a balanced load condition. Under heavy load, CoDE can further decrease the inference time by $30\%$ while the accuracy is reduced by only $4\%$.
- [254] arXiv:2401.00663 (cross-list from cs.CV) [ pdf , other ]
-
Title: 1st Place Solution for 5th LSVOS Challenge: Referring Video Object SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The recent transformer-based models have dominated the Referring Video Object Segmentation (RVOS) task due to the superior performance. Most prior works adopt unified DETR framework to generate segmentation masks in query-to-instance manner. In this work, we integrate strengths of that leading RVOS models to build up an effective paradigm. We first obtain binary mask sequences from the RVOS models. To improve the consistency and quality of masks, we propose Two-Stage Multi-Model Fusion strategy. Each stage rationally ensembles RVOS models based on framework design as well as training strategy, and leverages different video object segmentation (VOS) models to enhance mask coherence by object propagation mechanism. Our method achieves 75.7% J&F on Ref-Youtube-VOS validation set and 70% J&F on test set, which ranks 1st place on 5th Large-scale Video Object Segmentation Challenge (ICCV 2023) track 3. Code is available at this https URL .
- [255] arXiv:2401.00685 (cross-list from cs.LG) [ pdf , other ]
-
Title: Communication-Efficient Federated Learning for LEO Satellite Networks Integrated with HAPs Using Hybrid NOMA-OFDMSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Space AI has become increasingly important and sometimes even necessary for government, businesses, and society. An active research topic under this mission is integrating federated learning (FL) with satellite communications (SatCom) so that numerous low Earth orbit (LEO) satellites can collaboratively train a machine learning model. However, the special communication environment of SatCom leads to a very slow FL training process up to days and weeks. This paper proposes NomaFedHAP, a novel FL-SatCom approach tailored to LEO satellites, that (1) utilizes high-altitude platforms (HAPs) as distributed parameter servers (PS) to enhance satellite visibility, and (2) introduces non-orthogonal multiple access (NOMA) into LEO to enable fast and bandwidth-efficient model transmissions. In addition, NomaFedHAP includes (3) a new communication topology that exploits HAPs to bridge satellites among different orbits to mitigate the Doppler shift, and (4) a new FL model aggregation scheme that optimally balances models between different orbits and shells. Moreover, we (5) derive a closed-form expression of the outage probability for satellites in near and far shells, as well as for the entire system. Our extensive simulations have validated the mathematical analysis and demonstrated the superior performance of NomaFedHAP in achieving fast and efficient FL model convergence with high accuracy as compared to the state-of-the-art.
- [256] arXiv:2401.00689 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large language model for Bible sentiment analysis: Sermon on the MountSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The revolution of natural language processing via large language models has motivated its use in multidisciplinary areas that include social sciences and humanities and more specifically, comparative religion. Sentiment analysis provides a mechanism to study the emotions expressed in text. Recently, sentiment analysis has been used to study and compare translations of the Bhagavad Gita, which is a fundamental and sacred Hindu text. In this study, we use sentiment analysis for studying selected chapters of the Bible. These chapters are known as the Sermon on the Mount. We utilize a pre-trained language model for sentiment analysis by reviewing five translations of the Sermon on the Mount, which include the King James version, the New International Version, the New Revised Standard Version, the Lamsa Version, and the Basic English Version. We provide a chapter-by-chapter and verse-by-verse comparison using sentiment and semantic analysis and review the major sentiments expressed. Our results highlight the varying sentiments across the chapters and verses. We found that the vocabulary of the respective translations is significantly different. We detected different levels of humour, optimism, and empathy in the respective chapters that were used by Jesus to deliver his message.
- [257] arXiv:2401.00698 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Large Language Models aren't all that you needSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper describes the architecture and systems built towards solving the SemEval 2023 Task 2: MultiCoNER II (Multilingual Complex Named Entity Recognition) [1]. We evaluate two approaches (a) a traditional Conditional Random Fields model and (b) a Large Language Model (LLM) fine-tuned with a customized head and compare the two approaches. The novel ideas explored are: 1) Decaying auxiliary loss (with residual) - where we train the model on an auxiliary task of Coarse-Grained NER and include this task as a part of the loss function 2) Triplet token blending - where we explore ways of blending the embeddings of neighboring tokens in the final NER layer prior to prediction 3) Task-optimal heads - where we explore a variety of custom heads and learning rates for the final layer of the LLM. We also explore multiple LLMs including GPT-3 and experiment with a variety of dropout and other hyperparameter settings before arriving at our final model which achieves micro & macro f1 of 0.85/0.84 (on dev) and 0.67/0.61 on the test data . We show that while pre-trained LLMs, by themselves, bring about a large improvement in scores as compared to traditional models, we also demonstrate that tangible improvements to the Macro-F1 score can be made by augmenting the LLM with additional feature/loss/model engineering techniques described above.
- [258] arXiv:2401.00700 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An attempt to generate new bridge types from latent space of generative adversarial networkAuthors: Hongjun ZhangComments: 8 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Try to generate new bridge types using generative artificial intelligence technology. Symmetric structured image dataset of three-span beam bridge, arch bridge, cable-stayed bridge and suspension bridge are used . Based on Python programming language, TensorFlow and Keras deep learning platform framework , as well as Wasserstein loss function and Lipschitz constraints, generative adversarial network is constructed and trained. From the obtained low dimensional bridge-type latent space sampling, new bridge types with asymmetric structures can be generated. Generative adversarial network can create new bridge types by organically combining different structural components on the basis of human original bridge types. It has a certain degree of human original ability. Generative artificial intelligence technology can open up imagination space and inspire humanity.
- [259] arXiv:2401.00711 (cross-list from cs.CV) [ pdf , other ]
-
Title: Text2Avatar: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable AttributeAuthors: Chaoqun Gong , Yuqin Dai , Ronghui Li , Achun Bao , Jun Li , Jian Yang , Yachao Zhang , Xiu LiSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Generating 3D human models directly from text helps reduce the cost and time of character modeling. However, achieving multi-attribute controllable and realistic 3D human avatar generation is still challenging due to feature coupling and the scarcity of realistic 3D human avatar datasets. To address these issues, we propose Text2Avatar, which can generate realistic-style 3D avatars based on the coupled text prompts. Text2Avatar leverages a discrete codebook as an intermediate feature to establish a connection between text and avatars, enabling the disentanglement of features. Furthermore, to alleviate the scarcity of realistic style 3D human avatar data, we utilize a pre-trained unconditional 3D human avatar generation model to obtain a large amount of 3D avatar pseudo data, which allows Text2Avatar to achieve realistic style generation. Experimental results demonstrate that our method can generate realistic 3D avatars from coupled textual data, which is challenging for other existing methods in this field.
- [260] arXiv:2401.00719 (cross-list from cs.CV) [ pdf , other ]
-
Title: Depth Map Denoising Network and Lightweight Fusion Network for Enhanced 3D Face RecognitionAuthors: Ruizhuo Xu , Ke Wang , Chao Deng , Mei Wang , Xi Chen , Wenhui Huang , Junlan Feng , Weihong DengComments: Accepted by Pattern RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: With the increasing availability of consumer depth sensors, 3D face recognition (FR) has attracted more and more attention. However, the data acquired by these sensors are often coarse and noisy, making them impractical to use directly. In this paper, we introduce an innovative Depth map denoising network (DMDNet) based on the Denoising Implicit Image Function (DIIF) to reduce noise and enhance the quality of facial depth images for low-quality 3D FR. After generating clean depth faces using DMDNet, we further design a powerful recognition network called Lightweight Depth and Normal Fusion network (LDNFNet), which incorporates a multi-branch fusion block to learn unique and complementary features between different modalities such as depth and normal images. Comprehensive experiments conducted on four distinct low-quality databases demonstrate the effectiveness and robustness of our proposed methods. Furthermore, when combining DMDNet and LDNFNet, we achieve state-of-the-art results on the Lock3DFace database.
- [261] arXiv:2401.00736 (cross-list from cs.CV) [ pdf , other ]
-
Title: Diffusion Models, Image Super-Resolution And Everything: A SurveyAuthors: Brian B. Moser , Arundhati S. Shanbhag , Federico Raue , Stanislav Frolov , Sebastian Palacio , Andreas DengelSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high computational demands, comparability, lack of explainability, color shifts, and more. Unfortunately, entry into this field is overwhelming because of the abundance of publications. To address this, we provide a unified recount of the theoretical foundations underlying DMs applied to image SR and offer a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. This survey articulates a cohesive understanding of DM principles and explores current research avenues, including alternative input domains, conditioning techniques, guidance mechanisms, corruption spaces, and zero-shot learning approaches. By offering a detailed examination of the evolution and current trends in image SR through the lens of DMs, this survey sheds light on the existing challenges and charts potential future directions, aiming to inspire further innovation in this rapidly advancing area.
- [262] arXiv:2401.00737 (cross-list from cs.IR) [ pdf , other ]
-
Title: Searching, fast and slow, through product catalogsAuthors: Dayananda Ubrangala , Juhi Sharma , Sharath Kumar Rangappa , Kiran R , Ravi Prasad Kondapalli , Laurent BouéJournal-ref: Microsoft Journal of Applied Research, Volume 20, 2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Abstract: String matching algorithms in the presence of abbreviations, such as in Stock Keeping Unit (SKU) product catalogs, remains a relatively unexplored topic. In this paper, we present a unified architecture for SKU search that provides both a real-time suggestion system (based on a Trie data structure) as well as a lower latency search system (making use of character level TF-IDF in combination with language model vector embeddings) where users initiate the search process explicitly. We carry out ablation studies that justify designing a complex search system composed of multiple components to address the delicate trade-off between speed and accuracy. Using SKU search in the Dynamics CRM as an example, we show how our system vastly outperforms, in all aspects, the results provided by the default search engine. Finally, we show how SKU descriptions may be enhanced via generative text models (using gpt-3.5-turbo) so that the consumers of the search results may get more context and a generally better experience when presented with the results of their SKU search.
- [263] arXiv:2401.00739 (cross-list from cs.CV) [ pdf , other ]
-
Title: DiffMorph: Text-less Image Morphing with Diffusion ModelsAuthors: Shounak ChatterjeeSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Text-conditioned image generation models are a prevalent use of AI image synthesis, yet intuitively controlling output guided by an artist remains challenging. Current methods require multiple images and textual prompts for each object to specify them as concepts to generate a single customized image.
On the other hand, our work, \verb|DiffMorph|, introduces a novel approach that synthesizes images that mix concepts without the use of textual prompts. Our work integrates a sketch-to-image module to incorporate user sketches as input. \verb|DiffMorph| takes an initial image with conditioning artist-drawn sketches to generate a morphed image.
We employ a pre-trained text-to-image diffusion model and fine-tune it to reconstruct each image faithfully. We seamlessly merge images and concepts from sketches into a cohesive composition. The image generation capability of our work is demonstrated through our results and a comparison of these with prompt-based image generation. - [264] arXiv:2401.00741 (cross-list from cs.CL) [ pdf , other ]
-
Title: ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world ScenariosAuthors: Junjie Ye , Guanyu Li , Songyang Gao , Caishuang Huang , Yilong Wu , Sixian Li , Xiaoran Fan , Shihan Dou , Qi Zhang , Tao Gui , Xuanjing HuangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes. However, these approaches rely on a limited set of scenarios where answers can be pre-determined, diverging from genuine needs. Furthermore, a sole emphasis on outcomes disregards the intricate capabilities essential for LLMs to effectively utilize tools. To tackle this issue, we propose ToolEyes, a fine-grained system tailored for the evaluation of the LLMs' tool learning capabilities in authentic scenarios. The system meticulously examines seven real-world scenarios, analyzing five dimensions crucial to LLMs in tool learning: format alignment, intent comprehension, behavior planning, tool selection, and answer organization. Additionally, ToolEyes incorporates a tool library boasting approximately 600 tools, serving as an intermediary between LLMs and the physical world. Evaluations involving ten LLMs across three categories reveal a preference for specific scenarios and limited cognitive abilities in tool learning. Intriguingly, expanding the model size even exacerbates the hindrance to tool learning. These findings offer instructive insights aimed at advancing the field of tool learning. The data is available att this https URL .
- [265] arXiv:2401.00756 (cross-list from cs.LG) [ pdf , other ]
-
Title: MPRE: Multi-perspective Patient Representation Extractor for Disease PredictionComments: Accepted by ICDM 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Patient representation learning based on electronic health records (EHR) is a critical task for disease prediction. This task aims to effectively extract useful information on dynamic features. Although various existing works have achieved remarkable progress, the model performance can be further improved by fully extracting the trends, variations, and the correlation between the trends and variations in dynamic features. In addition, sparse visit records limit the performance of deep learning models. To address these issues, we propose the Multi-perspective Patient Representation Extractor (MPRE) for disease prediction. Specifically, we propose Frequency Transformation Module (FTM) to extract the trend and variation information of dynamic features in the time-frequency domain, which can enhance the feature representation. In the 2D Multi-Extraction Network (2D MEN), we form the 2D temporal tensor based on trend and variation. Then, the correlations between trend and variation are captured by the proposed dilated operation. Moreover, we propose the First-Order Difference Attention Mechanism (FODAM) to calculate the contributions of differences in adjacent variations to the disease diagnosis adaptively. To evaluate the performance of MPRE and baseline methods, we conduct extensive experiments on two real-world public datasets. The experiment results show that MPRE outperforms state-of-the-art baseline methods in terms of AUROC and AUPRC.
- [266] arXiv:2401.00757 (cross-list from cs.SE) [ pdf , other ]
-
Title: A & B == B & A: Triggering Logical Reasoning Failures in Large Language ModelsAuthors: Yuxuan Wan , Wenxuan Wang , Yiliu Yang , Youliang Yuan , Jen-tse Huang , Pinjia He , Wenxiang Jiao , Michael R. LyuSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Logic in Computer Science (cs.LO)
Abstract: Recent advancements in large language models (LLMs) have propelled Artificial Intelligence (AI) to new heights, enabling breakthroughs in various tasks such as writing assistance, code generation, and machine translation. A significant distinction of advanced LLMs, such as ChatGPT, is their demonstrated ability to "reason." However, evaluating the reasoning ability of LLMs remains a challenge as most existing evaluations focus on their accuracy on the downstream tasks rather than directly assessing their reasoning processes. Efforts have been made to develop benchmarks and metrics to assess reasoning in LLMs, but they suffer from data leakage or limited scope. In this paper, we introduce LogicAsker, an automatic approach that comprehensively evaluates and improves the logical reasoning abilities of LLMs under a set of atomic reasoning skills based on propositional and predicate logic. The results provide insights into LLMs' reasoning abilities and reveal the logical rules the LLMs did not learn well. We evaluate LogicAsker on six widely deployed LLMs, including GPT-3, ChatGPT, GPT-4, Bard, Vicuna, and Guanaco. The results show that test cases from LogicAsker can find logical reasoning failures in different LLMs with a rate of 25\% - 94\%. In addition, the test cases of LogicAsker can be further used to design demonstration examples for in-context learning, which effectively improves the logical reasoning ability of LLMs, e.g., 10\% for GPT-4. As far as we know, our work is the first to create prompts based on testing results to improve LLMs' formal reasoning ability effectively. All the code, data, and results will be released for reproduction and future research.
- [267] arXiv:2401.00761 (cross-list from cs.SE) [ pdf , other ]
-
Title: The Earth is Flat? Unveiling Factual Errors in Large Language ModelsAuthors: Wenxuan Wang , Juluan Shi , Zhaopeng Tu , Youliang Yuan , Jen-tse Huang , Wenxiang Jiao , Michael R. LyuSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) like ChatGPT are foundational in various applications due to their extensive knowledge from pre-training and fine-tuning. Despite this, they are prone to generating factual and commonsense errors, raising concerns in critical areas like healthcare, journalism, and education to mislead users. Current methods for evaluating LLMs' veracity are limited by test data leakage or the need for extensive human labor, hindering efficient and accurate error detection. To tackle this problem, we introduce a novel, automatic testing framework, FactChecker, aimed at uncovering factual inaccuracies in LLMs. This framework involves three main steps: First, it constructs a factual knowledge graph by retrieving fact triplets from a large-scale knowledge database. Then, leveraging the knowledge graph, FactChecker employs a rule-based approach to generates three types of questions (Yes-No, Multiple-Choice, and WH questions) that involve single-hop and multi-hop relations, along with correct answers. Lastly, it assesses the LLMs' responses for accuracy using tailored matching strategies for each question type. Our extensive tests on six prominent LLMs, including text-davinci-002, text-davinci-003, ChatGPT~(gpt-3.5-turbo, gpt-4), Vicuna, and LLaMA-2, reveal that FactChecker can trigger factual errors in up to 45\% of questions in these models. Moreover, we demonstrate that FactChecker's test cases can improve LLMs' factual accuracy through in-context learning and fine-tuning (e.g., llama-2-13b-chat's accuracy increase from 35.3\% to 68.5\%). We are making all code, data, and results available for future research endeavors.
- [268] arXiv:2401.00763 (cross-list from cs.SE) [ pdf , other ]
-
Title: New Job, New Gender? Measuring the Social Bias in Image Generation ModelsAuthors: Wenxuan Wang , Haonan Bai , Jen-tse Huang , Yuxuan Wan , Youliang Yuan , Haoyi Qiu , Nanyun Peng , Michael R. LyuSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract: Image generation models can generate or edit images from a given text. Recent advancements in image generation technology, exemplified by DALL-E and Midjourney, have been groundbreaking. These advanced models, despite their impressive capabilities, are often trained on massive Internet datasets, making them susceptible to generating content that perpetuates social stereotypes and biases, which can lead to severe consequences. Prior research on assessing bias within image generation models suffers from several shortcomings, including limited accuracy, reliance on extensive human labor, and lack of comprehensive analysis. In this paper, we propose BiasPainter, a novel metamorphic testing framework that can accurately, automatically and comprehensively trigger social bias in image generation models. BiasPainter uses a diverse range of seed images of individuals and prompts the image generation models to edit these images using gender, race, and age-neutral queries. These queries span 62 professions, 39 activities, 57 types of objects, and 70 personality traits. The framework then compares the edited images to the original seed images, focusing on any changes related to gender, race, and age. BiasPainter adopts a testing oracle that these characteristics should not be modified when subjected to neutral prompts. Built upon this design, BiasPainter can trigger the social bias and evaluate the fairness of image generation models. To evaluate the effectiveness of BiasPainter, we use BiasPainter to test five widely-used commercial image generation software and models, such as stable diffusion and Midjourney. Experimental results show that 100\% of the generated test cases can successfully trigger social bias in image generation models.
- [269] arXiv:2401.00773 (cross-list from cs.LG) [ pdf , other ]
-
Title: Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process MixturesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Probabilistic mixture models are acknowledged as a valuable tool for unsupervised outlier detection owing to their interpretability and intuitive grounding in statistical principles. Within this framework, Dirichlet process mixture models emerge as a compelling alternative to conventional finite mixture models for both clustering and outlier detection tasks. However, despite their evident advantages, the widespread adoption of Dirichlet process mixture models in unsupervised outlier detection has been hampered by challenges related to computational inefficiency and sensitivity to outliers during the construction of detectors. To tackle these challenges, we propose a novel outlier detection method based on ensembles of Dirichlet process Gaussian mixtures. The proposed method is a fully unsupervised algorithm that capitalizes on random subspace and subsampling ensembles, not only ensuring efficient computation but also enhancing the robustness of the resulting outlier detector. Moreover, the proposed method leverages variational inference for Dirichlet process mixtures to ensure efficient and fast computation. Empirical studies with benchmark datasets demonstrate that our method outperforms existing approaches for unsupervised outlier detection.
- [270] arXiv:2401.00776 (cross-list from cs.RO) [ pdf , other ]
-
Title: Edge Computing based Human-Robot Cognitive Fusion: A Medical Case Study in the Autism Spectrum Disorder TherapyAuthors: Qin YangComments: This paper was accepted by the 38th AAAI 2024 workshop: "Cooperative Multi-Agent Systems Decision-Making and Learning: From Individual Needs to Swarm Intelligence"Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: In recent years, edge computing has served as a paradigm that enables many future technologies like AI, Robotics, IoT, and high-speed wireless sensor networks (like 5G) by connecting cloud computing facilities and services to the end users. Especially in medical and healthcare applications, it provides remote patient monitoring and increases voluminous multimedia. From the robotics angle, robot-assisted therapy (RAT) is an active-assistive robotic technology in rehabilitation robotics, attracting many researchers to study and benefit people with disability like autism spectrum disorder (ASD) children. However, the main challenge of RAT is that the model capable of detecting the affective states of ASD people exists and can recall individual preferences. Moreover, involving expert diagnosis and recommendations to guide robots in updating the therapy approach to adapt to different statuses and scenarios is a crucial part of the ASD therapy process. This paper proposes the architecture of edge cognitive computing by combining human experts and assisted robots collaborating in the same framework to help ASD patients with long-term support. By integrating the real-time computing and analysis of a new cognitive robotic model for ASD therapy, the proposed architecture can achieve a seamless remote diagnosis, round-the-clock symptom monitoring, emergency warning, therapy alteration, and advanced assistance.
- [271] arXiv:2401.00779 (cross-list from cs.CL) [ pdf , other ]
-
Title: Temporal Validity Change PredictionComments: 9 pages, 9 figures, 3 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Temporal validity is an important property of text that is useful for many downstream applications, such as recommender systems, conversational AI, or story understanding. Existing benchmarking tasks often require models to identify the temporal validity duration of a single statement. However, in many cases, additional contextual information, such as sentences in a story or posts on a social media profile, can be collected from the available text stream. This contextual information may greatly alter the duration for which a statement is expected to be valid. We propose Temporal Validity Change Prediction, a natural language processing task benchmarking the capability of machine learning models to detect contextual statements that induce such change. We create a dataset consisting of temporal target statements sourced from Twitter and crowdsource sample context statements. We then benchmark a set of transformer-based language models on our dataset. Finally, we experiment with temporal validity duration prediction as an auxiliary task to improve the performance of the state-of-the-art model.
- [272] arXiv:2401.00788 (cross-list from cs.CL) [ pdf , other ]
-
Title: Astraios: Parameter-Efficient Instruction Tuning Code Large Language ModelsAuthors: Terry Yue Zhuo , Armel Zebaze , Nitchakarn Suppattarachai , Leandro von Werra , Harm de Vries , Qian Liu , Niklas MuennighoffComments: 25 pages (12 main), 19 figures, 8 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: The high cost of full-parameter fine-tuning (FFT) of Large Language Models (LLMs) has led to a series of parameter-efficient fine-tuning (PEFT) methods. However, it remains unclear which methods provide the best cost-performance trade-off at different model scales. We introduce Astraios, a suite of 28 instruction-tuned OctoCoder models using 7 tuning methods and 4 model sizes up to 16 billion parameters. Through investigations across 5 tasks and 8 different datasets encompassing both code comprehension and code generation tasks, we find that FFT generally leads to the best downstream performance across all scales, and PEFT methods differ significantly in their efficacy based on the model scale. LoRA usually offers the most favorable trade-off between cost and performance. Further investigation into the effects of these methods on both model robustness and code security reveals that larger models tend to demonstrate reduced robustness and less security. At last, we explore the relationships among updated parameters, cross-entropy loss, and task performance. We find that the tuning effectiveness observed in small models generalizes well to larger models, and the validation loss in instruction tuning can be a reliable indicator of overall downstream performance.
- [273] arXiv:2401.00850 (cross-list from cs.CV) [ pdf , other ]
-
Title: Refining Pre-Trained Motion ModelsComments: Accepted at ICRA 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Given the difficulty of manually annotating motion in video, the current best motion estimation methods are trained with synthetic data, and therefore struggle somewhat due to a train/test gap. Self-supervised methods hold the promise of training directly on real video, but typically perform worse. These include methods trained with warp error (i.e., color constancy) combined with smoothness terms, and methods that encourage cycle-consistency in the estimates (i.e., tracking backwards should yield the opposite trajectory as tracking forwards). In this work, we take on the challenge of improving state-of-the-art supervised models with self-supervised training. We find that when the initialization is supervised weights, most existing self-supervision techniques actually make performance worse instead of better, which suggests that the benefit of seeing the new data is overshadowed by the noise in the training signal. Focusing on obtaining a "clean" training signal from real-world unlabelled video, we propose to separate label-making and training into two distinct stages. In the first stage, we use the pre-trained model to estimate motion in a video, and then select the subset of motion estimates which we can verify with cycle-consistency. This produces a sparse but accurate pseudo-labelling of the video. In the second stage, we fine-tune the model to reproduce these outputs, while also applying augmentations on the input. We complement this boot-strapping method with simple techniques that densify and re-balance the pseudo-labels, ensuring that we do not merely train on "easy" tracks. We show that our method yields reliable gains over fully-supervised methods in real videos, for both short-term (flow-based) and long-range (multi-frame) pixel tracking.
- [274] arXiv:2401.00867 (cross-list from cs.LG) [ pdf , other ]
-
Title: Tensor Networks for Explainable Machine Learning in CybersecurityComments: 9 pages, 9 figures, 2 table, minor typos correctedSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
Abstract: In this paper we show how tensor networks help in developing explainability of machine learning algorithms. Specifically, we develop an unsupervised clustering algorithm based on Matrix Product States (MPS) and apply it in the context of a real use-case of adversary-generated threat intelligence. Our investigation proves that MPS rival traditional deep learning models such as autoencoders and GANs in terms of performance, while providing much richer model interpretability. Our approach naturally facilitates the extraction of feature-wise probabilities, Von Neumann Entropy, and mutual information, offering a compelling narrative for classification of anomalies and fostering an unprecedented level of transparency and interpretability, something fundamental to understand the rationale behind artificial intelligence decisions.
- [275] arXiv:2401.00870 (cross-list from cs.CR) [ pdf , other ]
-
Title: Teach Large Language Models to Forget PrivacyComments: 32 pagesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have proven powerful, but the risk of privacy leakage remains a significant concern. Traditional privacy-preserving methods, such as Differential Privacy and Homomorphic Encryption, are inadequate for black-box API-only settings, demanding either model transparency or heavy computational resources. We propose Prompt2Forget (P2F), the first framework designed to tackle the LLM local privacy challenge by teaching LLM to forget. The method involves decomposing full questions into smaller segments, generating fabricated answers, and obfuscating the model's memory of the original input. A benchmark dataset was crafted with questions containing privacy-sensitive information from diverse fields. P2F achieves zero-shot generalization, allowing adaptability across a wide range of use cases without manual adjustments. Experimental results indicate P2F's robust capability to obfuscate LLM's memory, attaining a forgetfulness score of around 90\% without any utility loss. This represents an enhancement of up to 63\% when contrasted with the naive direct instruction technique, highlighting P2F's efficacy in mitigating memory retention of sensitive information within LLMs. Our findings establish the first benchmark in the novel field of the LLM forgetting task, representing a meaningful advancement in privacy preservation in the emerging LLM domain.
- [276] arXiv:2401.00876 (cross-list from cs.LG) [ pdf , other ]
-
Title: Balanced Graph Structure Information for Brain Disease DetectionComments: Presented at Pacific Rim Knowledge Acquisition Workshop (PKAW) 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
Abstract: Analyzing connections between brain regions of interest (ROI) is vital to detect neurological disorders such as autism or schizophrenia. Recent advancements employ graph neural networks (GNNs) to utilize graph structures in brains, improving detection performances. Current methods use correlation measures between ROI's blood-oxygen-level-dependent (BOLD) signals to generate the graph structure. Other methods use the training samples to learn the optimal graph structure through end-to-end learning. However, implementing those methods independently leads to some issues with noisy data for the correlation graphs and overfitting problems for the optimal graph. In this work, we proposed Bargrain (balanced graph structure for brains), which models two graph structures: filtered correlation matrix and optimal sample graph using graph convolution networks (GCNs). This approach aims to get advantages from both graphs and address the limitations of only relying on a single type of structure. Based on our extensive experiment, Bargrain outperforms state-of-the-art methods in classification tasks on brain disease datasets, as measured by average F1 scores.
- [277] arXiv:2401.00883 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Automating Leukemia Diagnosis with Autoencoders: A Comparative StudySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Leukemia is one of the most common and death-threatening types of cancer that threaten human life. Medical data from some of the patient's critical parameters contain valuable information hidden among these data. On this subject, deep learning can be used to extract this information. In this paper, AutoEncoders have been used to develop valuable features to help the precision of leukemia diagnosis. It has been attempted to get the best activation function and optimizer to use in AutoEncoder and designed the best architecture for this neural network. The proposed architecture is compared with this area's classical machine learning models. Our proposed method performs better than other machine learning in precision and f1-score metrics by more than 11%.
- [278] arXiv:2401.00893 (cross-list from cs.SI) [ pdf , other ]
-
Title: Social-LLM: Modeling User Behavior at Scale using Language Models and Social Network DataComments: 10 pages, 5 figures, 2 tablesSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: The proliferation of social network data has unlocked unprecedented opportunities for extensive, data-driven exploration of human behavior. The structural intricacies of social networks offer insights into various computational social science issues, particularly concerning social influence and information diffusion. However, modeling large-scale social network data comes with computational challenges. Though large language models make it easier than ever to model textual content, any advanced network representation methods struggle with scalability and efficient deployment to out-of-sample users. In response, we introduce a novel approach tailored for modeling social network data in user detection tasks. This innovative method integrates localized social network interactions with the capabilities of large language models. Operating under the premise of social network homophily, which posits that socially connected users share similarities, our approach is designed to address these challenges. We conduct a thorough evaluation of our method across seven real-world social network datasets, spanning a diverse range of topics and detection tasks, showcasing its applicability to advance research in computational social science.
- [279] arXiv:2401.00897 (cross-list from cs.CV) [ pdf , other ]
-
Title: Masked Modeling for Self-supervised Representation Learning on Vision and BeyondAuthors: Siyuan Li , Luyuan Zhang , Zedong Wang , Di Wu , Lirong Wu , Zicheng Liu , Jun Xia , Cheng Tan , Yang Liu , Baigui Sun , Stan Z. LiComments: Preprint v2 (fix typos and citations). GitHub project at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. This paradigm enables deep models to learn robust representations and has demonstrated exceptional performance in the context of computer vision, natural language processing, and other modalities. In this survey, we present a comprehensive review of the masked modeling framework and its methodology. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. Then, we systematically investigate its wide-ranging applications across domains. Furthermore, we also explore the commonalities and differences between masked modeling methods in different fields. Toward the end of this paper, we conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research. A paper list project with this survey is available at \url{ this https URL }.
- [280] arXiv:2401.00907 (cross-list from cs.LG) [ pdf , other ]
-
Title: LaFFi: Leveraging Hybrid Natural Language Feedback for Fine-tuning Language ModelsAuthors: Qianxi Li , Yingyue Cao , Jikun Kang , Tianpei Yang , Xi Chen , Jun Jin , Matthew E. TaylorComments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024 ( this https URL )Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Fine-tuning Large Language Models (LLMs) adapts a trained model to specific downstream tasks, significantly improving task-specific performance. Supervised Fine-Tuning (SFT) is a common approach, where an LLM is trained to produce desired answers. However, LLMs trained with SFT sometimes make simple mistakes and result in hallucinations on reasoning tasks such as question-answering. Without external feedback, it is difficult for SFT to learn a good mapping between the question and the desired answer, especially with a small dataset. This paper introduces an alternative to SFT called Natural Language Feedback for Finetuning LLMs (LaFFi). LaFFi has LLMs directly predict the feedback they will receive from an annotator. We find that requiring such reflection can significantly improve the accuracy in in-domain question-answering tasks, providing a promising direction for the application of natural language feedback in the realm of SFT LLMs. Additional ablation studies show that the portion of human-annotated data in the annotated datasets affects the fine-tuning performance.
- [281] arXiv:2401.00926 (cross-list from cs.CV) [ pdf , other ]
-
Title: Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood DiseasesAuthors: Yifei Chen , Chenyan Zhang , Ben Chen , Yiyu Huang , Yifei Sun , Changmiao Wang , Xianjun Fu , Yuxing Dai , Feiwei Qin , Yong Peng , Yu GaoComments: 15 pages, 11 figures, accept Computers in Biology and Medicine 2024Journal-ref: Computers in Biology and Medicine 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In standard hospital blood tests, the traditional process requires doctors to manually isolate leukocytes from microscopic images of patients' blood using microscopes. These isolated leukocytes are then categorized via automatic leukocyte classifiers to determine the proportion and volume of different types of leukocytes present in the blood samples, aiding disease diagnosis. This methodology is not only time-consuming and labor-intensive, but it also has a high propensity for errors due to factors such as image quality and environmental conditions, which could potentially lead to incorrect subsequent classifications and misdiagnosis. To address these issues, this paper proposes an innovative method of leukocyte detection: the Multi-level Feature Fusion and Deformable Self-attention DETR (MFDS-DETR). To tackle the issue of leukocyte scale disparity, we designed the High-level Screening-feature Fusion Pyramid (HS-FPN), enabling multi-level fusion. This model uses high-level features as weights to filter low-level feature information via a channel attention module and then merges the screened information with the high-level features, thus enhancing the model's feature expression capability. Further, we address the issue of leukocyte feature scarcity by incorporating a multi-scale deformable self-attention module in the encoder and using the self-attention and cross-deformable attention mechanisms in the decoder, which aids in the extraction of the global features of the leukocyte feature maps. The effectiveness, superiority, and generalizability of the proposed MFDS-DETR method are confirmed through comparisons with other cutting-edge leukocyte detection models using the private WBCDD, public LISC and BCCD datasets. Our source code and private WBCCD dataset are available at this https URL .
- [282] arXiv:2401.00961 (cross-list from cs.LG) [ pdf , other ]
-
Title: Automated Model Selection for Tabular DataAuthors: Avinash Amballa , Anmol Mekala , Gayathri Akkinapalli , Manas Madine , Naga Pavana Priya Yarrabolu , Przemyslaw A. GrabowiczComments: 10 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Structured data in the form of tabular datasets contain features that are distinct and discrete, with varying individual and relative importances to the target. Combinations of one or more features may be more predictive and meaningful than simple individual feature contributions. R's mixed effect linear models library allows users to provide such interactive feature combinations in the model design. However, given many features and possible interactions to select from, model selection becomes an exponentially difficult task. We aim to automate the model selection process for predictions on tabular datasets incorporating feature interactions while keeping computational costs small. The framework includes two distinct approaches for feature selection: a Priority-based Random Grid Search and a Greedy Search method. The Priority-based approach efficiently explores feature combinations using prior probabilities to guide the search. The Greedy method builds the solution iteratively by adding or removing features based on their impact. Experiments on synthetic demonstrate the ability to effectively capture predictive feature combinations.
- [283] arXiv:2401.00964 (cross-list from cs.CV) [ pdf , other ]
-
Title: Data Augmentation Techniques for Cross-Domain WiFi CSI-based Human Activity RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The recognition of human activities based on WiFi Channel State Information (CSI) enables contactless and visual privacy-preserving sensing in indoor environments. However, poor model generalization, due to varying environmental conditions and sensing hardware, is a well-known problem in this space. To address this issue, in this work, data augmentation techniques commonly used in image-based learning are applied to WiFi CSI to investigate their effects on model generalization performance in cross-scenario and cross-system settings. In particular, we focus on the generalization between line-of-sight (LOS) and non-line-of-sight (NLOS) through-wall scenarios, as well as on the generalization between different antenna systems, which remains under-explored. We collect and make publicly available a dataset of CSI amplitude spectrograms of human activities. Utilizing this data, an ablation study is conducted in which activity recognition models based on the EfficientNetV2 architecture are trained, allowing us to assess the effects of each augmentation on model generalization performance. The gathered results show that specific combinations of simple data augmentation techniques applied to CSI amplitude data can significantly improve cross-scenario and cross-system generalization.
- [284] arXiv:2401.00974 (cross-list from cs.LG) [ pdf , other ]
-
Title: Downstream Task-Oriented Generative Model Selections on Synthetic Data Training for Fraud Detection ModelsComments: The following article has been accepted by ICAIF22, Synthetic Data for AI in Finance; see this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Devising procedures for downstream task-oriented generative model selections is an unresolved problem of practical importance. Existing studies focused on the utility of a single family of generative models. They provided limited insights on how synthetic data practitioners select the best family generative models for synthetic training tasks given a specific combination of machine learning model class and performance metric. In this paper, we approach the downstream task-oriented generative model selections problem in the case of training fraud detection models and investigate the best practice given different combinations of model interpretability and model performance constraints. Our investigation supports that, while both Neural Network(NN)-based and Bayesian Network(BN)-based generative models are both good to complete synthetic training task under loose model interpretability constrain, the BN-based generative models is better than NN-based when synthetic training fraud detection model under strict model interpretability constrain. Our results provides practical guidance for machine learning practitioner who is interested in replacing their training dataset from real to synthetic, and shed lights on more general downstream task-oriented generative model selection problems.
- [285] arXiv:2401.00976 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Nature-Inspired Algorithms in Optimization: Introduction, Hybridization and InsightsAuthors: Xin-She YangComments: 15 pages, 4 figuresSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: Many problems in science and engineering are optimization problems, which may require sophisticated optimization techniques to solve. Nature-inspired algorithms are a class of metaheuristic algorithms for optimization, and some algorithms or variants are often developed by hybridization. Benchmarking is also important in evaluating the performance of optimization algorithms. This chapter focuses on the overview of optimization, nature-inspired algorithms and the role of hybridization. We will also highlight some issues with hybridization of algorithms.
- [286] arXiv:2401.00986 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep LearningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Detection of small, undetermined moving objects or objects in an occluded environment with a cluttered background is the main problem of computer vision. This greatly affects the detection accuracy of deep learning models. To overcome these problems, we concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background employing SSD and YOLO algorithms and improved precision of detection and reduce problems faced by these models. The developed method makes the custom dataset and employs a preprocessing technique to clean the noisy dataset. For training the developed model we apply the data augmentation technique to balance and diversify the data. We fine-tuned, trained, and evaluated these models on the established dataset by applying these techniques and highlighting the results we got more accurately than without applying these techniques. The accuracy and frame per second of the SSD-Mobilenet v2 model are higher than YOLO V3 and YOLO V4. Furthermore, by employing various techniques like data enhancement, noise reduction, parameter optimization, and model fusion we improve the effectiveness of detection and recognition. We further added a counting algorithm, and target attributes experimental comparison, and made a graphical user interface system for the developed model with features of object counting, alerts, status, resolution, and frame per second. Subsequently, to justify the importance of the developed method analysis of YOLO V3, V4, and SSD were incorporated. Which resulted in the overall completion of the proposed method.
- [287] arXiv:2401.01007 (cross-list from cs.NI) [ pdf , other ]
-
Title: Towards Net-Zero Carbon Emissions in Network AI for 6G and BeyondJournal-ref: published as Early Access at the IEEE Communications Magazine, 2023 (URL: https://ieeexplore.ieee.org/abstract/document/10247147)Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: A global effort has been initiated to reduce the worldwide greenhouse gas (GHG) emissions, primarily carbon emissions, by half by 2030 and reach net-zero by 2050. The development of 6G must also be compliant with this goal. Unfortunately, developing a sustainable and net-zero emission systems to meet the users' fast growing demands on mobile services, especially smart services and applications, may be much more challenging than expected. Particularly, despite the energy efficiency improvement in both hardware and software designs, the overall energy consumption and carbon emission of mobile networks are still increasing at a tremendous speed. The growing penetration of resource-demanding AI algorithms and solutions further exacerbate this challenge. In this article, we identify the major emission sources and introduce an evaluation framework for analyzing the lifecycle of network AI implementations. A novel joint dynamic energy trading and task allocation optimization framework, called DETA, has been introduced to reduce the overall carbon emissions. We consider a federated edge intelligence-based network AI system as a case study to verify the effectiveness of our proposed solution. Experimental results based on a hardware prototype suggest that our proposed solution can reduce carbon emissions of network AI systems by up to 74.9%. Finally, open problems and future directions are discussed.
- [288] arXiv:2401.01008 (cross-list from cs.CV) [ pdf , other ]
-
Title: Fast Inference Through The Reuse Of Attention Maps In Diffusion ModelsAuthors: Rosco Hunter , Łukasz Dudziak , Mohamed S. Abdelfattah , Abhinav Mehrotra , Sourav Bhattacharya , Hongkai WenSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Text-to-image diffusion models have demonstrated unprecedented abilities at flexible and realistic image synthesis. However, the iterative process required to produce a single image is costly and incurs a high latency, prompting researchers to further investigate its efficiency. Typically, improvements in latency have been achieved in two ways: (1) training smaller models through knowledge distillation (KD); and (2) adopting techniques from ODE-theory to facilitate larger step sizes. In contrast, we propose a training-free approach that does not alter the step-size of the sampler. Specifically, we find the repeated calculation of attention maps to be both costly and redundant; therefore, we propose a structured reuse of attention maps during sampling. Our initial reuse policy is motivated by rudimentary ODE-theory, which suggests that reuse is most suitable late in the sampling procedure. After noting a number of limitations in this theoretical approach, we empirically search for a better policy. Unlike methods that rely on KD, our reuse policies can easily be adapted to a variety of setups in a plug-and-play manner. Furthermore, when applied to Stable Diffusion-1.5, our reuse policies reduce latency with minimal repercussions on sample quality.
- [289] arXiv:2401.01044 (cross-list from cs.SD) [ pdf , other ]
-
Title: Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio GenerationComments: Demo and implementation at this https URLSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Abstract: Recent advancements in diffusion models and large language models (LLMs) have significantly propelled the field of AIGC. Text-to-Audio (TTA), a burgeoning AIGC application designed to generate audio from natural language prompts, is attracting increasing attention. However, existing TTA studies often struggle with generation quality and text-audio alignment, especially for complex textual inputs. Drawing inspiration from state-of-the-art Text-to-Image (T2I) diffusion models, we introduce Auffusion, a TTA system adapting T2I model frameworks to TTA task, by effectively leveraging their inherent generative strengths and precise cross-modal alignment. Our objective and subjective evaluations demonstrate that Auffusion surpasses previous TTA approaches using limited data and computational resource. Furthermore, previous studies in T2I recognizes the significant impact of encoder choice on cross-modal alignment, like fine-grained details and object bindings, while similar evaluation is lacking in prior TTA works. Through comprehensive ablation studies and innovative cross-attention map visualizations, we provide insightful assessments of text-audio alignment in TTA. Our findings reveal Auffusion's superior capability in generating audios that accurately match textual descriptions, which further demonstrated in several related tasks, such as audio style transfer, inpainting and other manipulations. Our implementation and demos are available at this https URL .
- [290] arXiv:2401.01054 (cross-list from cs.LG) [ pdf , other ]
-
Title: Elastic Multi-Gradient Descent for Parallel Continual LearningComments: Submited to IEEE TPAMISubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The goal of Continual Learning (CL) is to continuously learn from new data streams and accomplish the corresponding tasks. Previously studied CL assumes that data are given in sequence nose-to-tail for different tasks, thus indeed belonging to Serial Continual Learning (SCL). This paper studies the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios, where a diverse set of tasks is encountered at different time points. PCL presents challenges due to the training of an unspecified number of tasks with varying learning progress, leading to the difficulty of guaranteeing effective model updates for all encountered tasks. In our previous conference work, we focused on measuring and reducing the discrepancy among gradients in a multi-objective optimization problem, which, however, may still contain negative transfers in every model update. To address this issue, in the dynamic multi-objective optimization problem, we introduce task-specific elastic factors to adjust the descent direction towards the Pareto front. The proposed method, called Elastic Multi-Gradient Descent (EMGD), ensures that each update follows an appropriate Pareto descent direction, minimizing any negative impact on previously learned tasks. To balance the training between old and new tasks, we also propose a memory editing mechanism guided by the gradient computed using EMGD. This editing process updates the stored data points, reducing interference in the Pareto descent direction from previous tasks. Experiments on public datasets validate the effectiveness of our EMGD in the PCL setting.
- [291] arXiv:2401.01055 (cross-list from cs.CL) [ pdf , other ]
-
Title: LLaMA Beyond English: An Empirical Study on Language Capability TransferSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In recent times, substantial advancements have been witnessed in large language models (LLMs), exemplified by ChatGPT, showcasing remarkable proficiency across a range of complex tasks. However, many mainstream LLMs (e.g. LLaMA) are pretrained on English-dominant corpus, which limits their performance in other non-English languages. In this paper, we focus on how to effectively transfer the capabilities of language generation and following instructions to a non-English language. To answer this question, we conduct an extensive empirical investigation based on LLaMA, accumulating over 1440 GPU hours. We analyze the impact of key factors such as vocabulary extension, further pretraining, and instruction tuning on transfer. To accurately assess the model's level of knowledge, we employ four widely used standardized testing benchmarks: C-Eval, MMLU, AGI-Eval, and GAOKAO-Bench. Furthermore, a comprehensive evaluation of the model's response quality is conducted, considering aspects such as accuracy, fluency, informativeness, logical coherence, and harmlessness, based on LLM-Eval, a benchmarks consisting instruction tasks from 17 diverse categories. Our evaluation results demonstrate that comparable performance to state-of-the-art transfer models can be achieved with less than 1% of the pretraining data, both in terms of knowledge alignment and response quality. Furthermore, the experimental outcomes across the thirteen low-resource languages also exhibit similar trends. We anticipate that the conclusions revealed by the experiments will aid the community in developing non-English LLMs.
- [292] arXiv:2401.01065 (cross-list from cs.CV) [ pdf , other ]
-
Title: BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous DrivingAuthors: Dafeng Wei , Tian Gao , Zhengyu Jia , Changwei Cai , Chengkai Hou , Peng Jia , Fu Liu , Kun Zhan , Jingchen Fan , Yixing Zhao , Yang WangComments: Under review of CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The demand for the retrieval of complex scene data in autonomous driving is increasing, especially as passenger vehicles have been equipped with the ability to navigate urban settings, with the imperative to address long-tail scenarios. Meanwhile, under the pre-existing two dimensional image retrieval method, some problems may arise with scene retrieval, such as lack of global feature representation and subpar text retrieval ability. To address these issues, we have proposed \textbf{BEV-CLIP}, the first multimodal Bird's-Eye View(BEV) retrieval methodology that utilizes descriptive text as an input to retrieve corresponding scenes. This methodology applies the semantic feature extraction abilities of a large language model (LLM) to facilitate zero-shot retrieval of extensive text descriptions, and incorporates semi-structured information from a knowledge graph to improve the semantic richness and variety of the language embedding. Our experiments result in 87.66% accuracy on NuScenes dataset in text-to-BEV feature retrieval. The demonstrated cases in our paper support that our retrieval method is also indicated to be effective in identifying certain long-tail corner scenes.
- [293] arXiv:2401.01068 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Discovering Significant Topics from Legal Decisions with Selective InferenceAuthors: Jerrold SohComments: This is an accepted manuscript of work forthcoming in PhilTrans A. Please cite the publisher's version onlySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We propose and evaluate an automated pipeline for discovering significant topics from legal decision texts by passing features synthesized with topic models through penalised regressions and post-selection significance tests. The method identifies case topics significantly correlated with outcomes, topic-word distributions which can be manually-interpreted to gain insights about significant topics, and case-topic weights which can be used to identify representative cases for each topic. We demonstrate the method on a new dataset of domain name disputes and a canonical dataset of European Court of Human Rights violation cases. Topic models based on latent semantic analysis as well as language model embeddings are evaluated. We show that topics derived by the pipeline are consistent with legal doctrines in both areas and can be useful in other related legal analysis tasks.
- [294] arXiv:2401.01078 (cross-list from cs.CL) [ pdf , other ]
-
Title: Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem TranslationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacious model, the GPT-3 Babbage variant, achieves a custom evaluation score of 0.8, specifically tailored to the "luc bat" genre of Vietnamese poetry. Furthermore, we also explore the idea of paraphrasing poems into normal text prompts and yield a relatively high score of 0.781 in the "luc bat" genre. This experiment presents the potential for cross-Language poem-to-poem translation with translated poems as the inputs while concurrently maintaining complete control over the generated content.
- [295] arXiv:2401.01089 (cross-list from cs.CL) [ pdf , other ]
-
Title: Quokka: An Open-source Large Language Model ChatBot for Material ScienceComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: This paper presents the development of a specialized chatbot for materials science, leveraging the Llama-2 language model, and continuing pre-training on the expansive research articles in the materials science domain from the S2ORC dataset. The methodology involves an initial pretraining phase on over one million domain-specific papers, followed by an instruction-tuning process to refine the chatbot's capabilities. The chatbot is designed to assist researchers, educators, and students by providing instant, context-aware responses to queries in the field of materials science. We make the four trained checkpoints (7B, 13B, with or without chat ability) freely available to the research community at this https URL .
- [296] arXiv:2401.01119 (cross-list from cs.LG) [ pdf , other ]
-
Title: Utilizing Autoregressive Networks for Full Lifecycle Data Generation of Rolling Bearings for RUL PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The prediction of rolling bearing lifespan is of significant importance in industrial production. However, the scarcity of high-quality, full lifecycle data has been a major constraint in achieving precise predictions. To address this challenge, this paper introduces the CVGAN model, a novel framework capable of generating one-dimensional vibration signals in both horizontal and vertical directions, conditioned on historical vibration data and remaining useful life. In addition, we propose an autoregressive generation method that can iteratively utilize previously generated vibration information to guide the generation of current signals. The effectiveness of the CVGAN model is validated through experiments conducted on the PHM 2012 dataset. Our findings demonstrate that the CVGAN model, in terms of both MMD and FID metrics, outperforms many advanced methods in both autoregressive and non-autoregressive generation modes. Notably, training using the full lifecycle data generated by the CVGAN model significantly improves the performance of the predictive model. This result highlights the effectiveness of the data generated by CVGans in enhancing the predictive power of these models.
- [297] arXiv:2401.01124 (cross-list from cs.LG) [ pdf , other ]
-
Title: Explainable Adaptive Tree-based Model Selection for Time Series ForecastingComments: Accepted and presented at ICDM 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Tree-based models have been successfully applied to a wide variety of tasks, including time series forecasting. They are increasingly in demand and widely accepted because of their comparatively high level of interpretability. However, many of them suffer from the overfitting problem, which limits their application in real-world decision-making. This problem becomes even more severe in online-forecasting settings where time series observations are incrementally acquired, and the distributions from which they are drawn may keep changing over time. In this context, we propose a novel method for the online selection of tree-based models using the TreeSHAP explainability method in the task of time series forecasting. We start with an arbitrary set of different tree-based models. Then, we outline a performance-based ranking with a coherent design to make TreeSHAP able to specialize the tree-based forecasters across different regions in the input time series. In this framework, adequate model selection is performed online, adaptively following drift detection in the time series. In addition, explainability is supported on three levels, namely online input importance, model selection, and model output explanation. An extensive empirical study on various real-world datasets demonstrates that our method achieves excellent or on-par results in comparison to the state-of-the-art approaches as well as several baselines.
- [298] arXiv:2401.01141 (cross-list from cs.NE) [ pdf , other ]
-
Title: Spiker+: a framework for the generation of efficient Spiking Neural Networks FPGA accelerators for inference at the edgeSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: Including Artificial Neural Networks in embedded systems at the edge allows applications to exploit Artificial Intelligence capabilities directly within devices operating at the network periphery. This paper introduces Spiker+, a comprehensive framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge. Spiker+ presents a configurable multi-layer hardware SNN, a library of highly efficient neuron architectures, and a design framework, enabling the development of complex neural network accelerators with few lines of Python code. Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD). On the MNIST, it demonstrates competitive performance compared to state-of-the-art SNN accelerators. It outperforms them in terms of resource allocation, with a requirement of 7,612 logic cells and 18 Block RAMs (BRAMs), which makes it fit in very small FPGA, and power consumption, draining only 180mW for a complete inference on an input image. The latency is comparable to the ones observed in the state-of-the-art, with 780us/img. To the authors' knowledge, Spiker+ is the first SNN accelerator tested on the SHD. In this case, the accelerator requires 18,268 logic cells and 51 BRAM, with an overall power consumption of 430mW and a latency of 54 us for a complete inference on input data. This underscores the significance of Spiker+ in the hardware-accelerated SNN landscape, making it an excellent solution to deploy configurable and tunable SNN architectures in resource and power-constrained edge applications.
- [299] arXiv:2401.01172 (cross-list from cs.LG) [ pdf , other ]
-
Title: Quadratic Time-Frequency Analysis of Vibration Signals for Diagnosing Bearing FaultsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Diagnosis of bearing faults is paramount to reducing maintenance costs and operational breakdowns. Bearing faults are primary contributors to machine vibrations, and analyzing their signal morphology offers insights into their health status. Unfortunately, existing approaches are optimized for controlled environments, neglecting realistic conditions such as time-varying rotational speeds and the vibration's non-stationary nature. This paper presents a fusion of time-frequency analysis and deep learning techniques to diagnose bearing faults under time-varying speeds and varying noise levels. First, we formulate the bearing fault-induced vibrations and discuss the link between their non-stationarity and the bearing's inherent and operational parameters. We also elucidate quadratic time-frequency distributions and validate their effectiveness in resolving distinctive dynamic patterns associated with different bearing faults. Based on this, we design a time-frequency convolutional neural network (TF-CNN) to diagnose various faults in rolling-element bearings. Our experimental findings undeniably demonstrate the superior performance of TF-CNN in comparison to recently developed techniques. They also assert its versatility in capturing fault-relevant non-stationary features that couple with speed changes and show its exceptional resilience to noise, consistently surpassing competing methods across various signal-to-noise ratios and performance metrics. Altogether, the TF-CNN achieves substantial accuracy improvements up to 15%, in severe noise conditions.
- [300] arXiv:2401.01179 (cross-list from cs.CV) [ pdf , other ]
-
Title: Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-trainingComments: Accepted by ICASSP 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Modern healthcare often utilises radiographic images alongside textual reports for diagnostics, encouraging the use of Vision-Language Self-Supervised Learning (VL-SSL) with large pre-trained models to learn versatile medical vision representations. However, most existing VL-SSL frameworks are trained end-to-end, which is computation-heavy and can lose vital prior information embedded in pre-trained encoders. To address both issues, we introduce the backbone-agnostic Adaptor framework, which preserves medical knowledge in pre-trained image and text encoders by keeping them frozen, and employs a lightweight Adaptor module for cross-modal learning. Experiments on medical image classification and segmentation tasks across three datasets reveal that our framework delivers competitive performance while cutting trainable parameters by over 90% compared to current pre-training approaches. Notably, when fine-tuned with just 1% of data, Adaptor outperforms several Transformer-based methods trained on full datasets in medical image segmentation.
- [301] arXiv:2401.01180 (cross-list from cs.CV) [ pdf , other ]
-
Title: Accurate and Efficient Urban Street Tree Inventory with Deep Learning on Mobile Phone ImageryComments: 8 Pages, 7 figures and 5 TablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Deforestation, a major contributor to climate change, poses detrimental consequences such as agricultural sector disruption, global warming, flash floods, and landslides. Conventional approaches to urban street tree inventory suffer from inaccuracies and necessitate specialised equipment. To overcome these challenges, this paper proposes an innovative method that leverages deep learning techniques and mobile phone imaging for urban street tree inventory. Our approach utilises a pair of images captured by smartphone cameras to accurately segment tree trunks and compute the diameter at breast height (DBH). Compared to traditional methods, our approach exhibits several advantages, including superior accuracy, reduced dependency on specialised equipment, and applicability in hard-to-reach areas. We evaluated our method on a comprehensive dataset of 400 trees and achieved a DBH estimation accuracy with an error rate of less than 2.5%. Our method holds significant potential for substantially improving forest management practices. By enhancing the accuracy and efficiency of tree inventory, our model empowers urban management to mitigate the adverse effects of deforestation and climate change.
- [302] arXiv:2401.01183 (cross-list from cs.CL) [ pdf , other ]
-
Title: Unifying Structured Data as Graph for Data-to-Text Pre-TrainingAuthors: Shujie Li , Liang Li , Ruiying Geng , Min Yang , Binhua Li , Guanghu Yuan , Wanwei He , Shao Yuan , Can Ma , Fei Huang , Yongbin LiComments: Accepted for TACL. Pre-MIT Press publication versionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performances. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at this https URL .
- [303] arXiv:2401.01189 (cross-list from cs.RO) [ pdf , other ]
-
Title: NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environmentsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhance inaccurate regions in semantic masks, particularly in marginal areas. Utilizing the geometric information present in depth images, this method enables accurate removal of dynamic objects, thereby reducing the probability of camera drift. Additionally, we introduce a keyframe selection strategy for dynamic scenes, which enhances camera tracking robustness against large-scale objects and improves the efficiency of mapping. Experiments on publicly available RGB-D datasets demonstrate that our method outperforms competitive neural SLAM approaches in tracking accuracy and mapping quality in dynamic environments.
- [304] arXiv:2401.01197 (cross-list from cs.CL) [ pdf , other ]
-
Title: Uncertainty Resolution in Misinformation DetectionAuthors: Yury Orlovskiy , Camille Thibault , Anne Imouza , Jean-François Godbout , Reihaneh Rabbany , Kellin PelrineSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Misinformation poses a variety of risks, such as undermining public trust and distorting factual discourse. Large Language Models (LLMs) like GPT-4 have been shown effective in mitigating misinformation, particularly in handling statements where enough context is provided. However, they struggle to assess ambiguous or context-deficient statements accurately. This work introduces a new method to resolve uncertainty in such statements. We propose a framework to categorize missing information and publish category labels for the LIAR-New dataset, which is adaptable to cross-domain content with missing information. We then leverage this framework to generate effective user queries for missing context. Compared to baselines, our method improves the rate at which generated questions are answerable by the user by 38 percentage points and classification performance by over 10 percentage points macro F1. Thus, this approach may provide a valuable component for future misinformation mitigation pipelines.
- [305] arXiv:2401.01199 (cross-list from cs.LG) [ pdf , other ]
-
Title: JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial ExampleSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Most of the approaches proposed so far to craft targeted adversarial examples against Deep Learning classifiers are highly suboptimal and typically rely on increasing the likelihood of the target class, thus implicitly focusing on one-hot encoding settings. In this paper, we propose a more general, theoretically sound, targeted attack that resorts to the minimization of a Jacobian-induced MAhalanobis distance (JMA) term, taking into account the effort (in the input space) required to move the latent space representation of the input sample in a given direction. The minimization is solved by exploiting the Wolfe duality theorem, reducing the problem to the solution of a Non-Negative Least Square (NNLS) problem. The proposed algorithm provides an optimal solution to a linearized version of the adversarial example problem originally introduced by Szegedy et al. \cite{szegedy2013intriguing}. The experiments we carried out confirm the generality of the proposed attack which is proven to be effective under a wide variety of output encoding schemes. Noticeably, the JMA attack is also effective in a multi-label classification scenario, being capable to induce a targeted modification of up to half the labels in a complex multilabel classification scenario with 20 labels, a capability that is out of reach of all the attacks proposed so far. As a further advantage, the JMA attack usually requires very few iterations, thus resulting more efficient than existing methods.
- [306] arXiv:2401.01200 (cross-list from cs.CV) [ pdf , other ]
-
Title: Skin cancer diagnosis using NIR spectroscopy data of skin lesions in vivo using machine learning algorithmsAuthors: Flavio P. Loss , Pedro H. da Cunha , Matheus B. Rocha , Madson Poltronieri Zanoni , Leandro M. de Lima , Isadora Tavares Nascimento , Isabella Rezende , Tania R. P. Canuto , Luciana de Paula Vieira , Renan Rossoni , Maria C. S. Santos , Patricia Lyra Frasson , Wanderson Romão , Paulo R. Filgueiras , Renato A. KrohlingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Skin lesions are classified in benign or malignant. Among the malignant, melanoma is a very aggressive cancer and the major cause of deaths. So, early diagnosis of skin cancer is very desired. In the last few years, there is a growing interest in computer aided diagnostic (CAD) using most image and clinical data of the lesion. These sources of information present limitations due to their inability to provide information of the molecular structure of the lesion. NIR spectroscopy may provide an alternative source of information to automated CAD of skin lesions. The most commonly used techniques and classification algorithms used in spectroscopy are Principal Component Analysis (PCA), Partial Least Squares - Discriminant Analysis (PLS-DA), and Support Vector Machines (SVM). Nonetheless, there is a growing interest in applying the modern techniques of machine and deep learning (MDL) to spectroscopy. One of the main limitations to apply MDL to spectroscopy is the lack of public datasets. Since there is no public dataset of NIR spectral data to skin lesions, as far as we know, an effort has been made and a new dataset named NIR-SC-UFES, has been collected, annotated and analyzed generating the gold-standard for classification of NIR spectral data to skin cancer. Next, the machine learning algorithms XGBoost, CatBoost, LightGBM, 1D-convolutional neural network (1D-CNN) were investigated to classify cancer and non-cancer skin lesions. Experimental results indicate the best performance obtained by LightGBM with pre-processing using standard normal variate (SNV), feature extraction providing values of 0.839 for balanced accuracy, 0.851 for recall, 0.852 for precision, and 0.850 for F-score. The obtained results indicate the first steps in CAD of skin lesions aiming the automated triage of patients with skin lesions in vivo using NIR spectral data.
- [307] arXiv:2401.01204 (cross-list from cs.CR) [ pdf , other ]
-
Title: PPBFL: A Privacy Protected Blockchain-based Federated Learning ModelSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: With the rapid development of machine learning and a growing concern for data privacy, federated learning has become a focal point of attention. However, attacks on model parameters and a lack of incentive mechanisms hinder the effectiveness of federated learning. Therefore, we propose A Privacy Protected Blockchain-based Federated Learning Model (PPBFL) to enhance the security of federated learning and encourage active participation of nodes in model training. Blockchain technology ensures the integrity of model parameters stored in the InterPlanetary File System (IPFS), providing protection against tampering. Within the blockchain, we introduce a Proof of Training Work (PoTW) consensus algorithm tailored for federated learning, aiming to incentive training nodes. This algorithm rewards nodes with greater computational power, promoting increased participation and effort in the federated learning process. A novel adaptive differential privacy algorithm is simultaneously applied to local and global models. This safeguards the privacy of local data at training clients, preventing malicious nodes from launching inference attacks. Additionally, it enhances the security of the global model, preventing potential security degradation resulting from the combination of numerous local models. The possibility of security degradation is derived from the composition theorem. By introducing reverse noise in the global model, a zero-bias estimate of differential privacy noise between local and global models is achieved. Furthermore, we propose a new mix transactions mechanism utilizing ring signature technology to better protect the identity privacy of local training clients. Security analysis and experimental results demonstrate that PPBFL, compared to baseline methods, not only exhibits superior model performance but also achieves higher security.
- [308] arXiv:2401.01218 (cross-list from cs.CL) [ pdf , other ]
-
Title: Zero-Shot Position Debiasing for Large Language ModelsComments: The version for ARR submission; 20 pages, 22 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Fine-tuning has been demonstrated to be an effective method to improve the domain performance of large language models (LLMs). However, LLMs might fit the dataset bias and shortcuts for prediction, leading to poor generation performance. Previous works have proven that LLMs are prone to exhibit position bias, i.e., leveraging information positioned at the beginning or end, or specific positional cues within the input. Existing debiasing methods for LLMs require external bias knowledge or annotated non-biased samples, which is lacking for position debiasing and impractical in reality. In this work, we propose a zero-shot position debiasing (ZOE) framework to mitigate position bias for LLMs. ZOE leverages unsupervised responses from pre-trained LLMs for debiasing without relying on any external knowledge. To improve the quality of unsupervised responses, we propose a MSA module to prune these responses. Experiments on eight datasets and five tasks show that ZOE consistently outperforms existing methods in mitigating three types of position biases. Besides, ZOE achieves this by sacrificing only a small performance on biased samples, which is general and effective. To facilitate the reproducibility of the results, we share the code of all methods and datasets on https://anonymous.4open.science/r/ZOE-F06B.
- [309] arXiv:2401.01227 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: IdentiFace : A VGG Based Multimodal Facial Biometric SystemComments: 12 pages, 22 figures and 9 imagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The development of facial biometric systems has contributed greatly to the development of the computer vision field. Nowadays, there's always a need to develop a multimodal system that combines multiple biometric traits in an efficient, meaningful way. In this paper, we introduce "IdentiFace" which is a multimodal facial biometric system that combines the core of facial recognition with some of the most important soft biometric traits such as gender, face shape, and emotion. We also focused on developing the system using only VGG-16 inspired architecture with minor changes across different subsystems. This unification allows for simpler integration across modalities. It makes it easier to interpret the learned features between the tasks which gives a good indication about the decision-making process across the facial modalities and potential connection. For the recognition problem, we acquired a 99.2% test accuracy for five classes with high intra-class variations using data collected from the FERET database[1]. We achieved 99.4% on our dataset and 95.15% on the public dataset[2] in the gender recognition problem. We were also able to achieve a testing accuracy of 88.03% in the face-shape problem using the celebrity face-shape dataset[3]. Finally, we achieved a decent testing accuracy of 66.13% in the emotion task which is considered a very acceptable accuracy compared to related work on the FER2013 dataset[4].
- [310] arXiv:2401.01242 (cross-list from cs.LG) [ pdf , other ]
-
Title: Encoding Binary Events from Continuous Time Series in Rooted Trees using Contrastive LearningComments: Extended abstract presented as a poster at the Northern Lights Deep Learning Conference 2024 in Troms{\o}, NorwaySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Abstract: Broadband infrastructure owners do not always know how their customers are connected in the local networks, which are structured as rooted trees. A recent study is able to infer the topology of a local network using discrete time series data from the leaves of the tree (customers). In this study we propose a contrastive approach for learning a binary event encoder from continuous time series data. As a preliminary result, we show that our approach has some potential in learning a valuable encoder.
- [311] arXiv:2401.01259 (cross-list from cs.LG) [ pdf , other ]
-
Title: Do Concept Bottleneck Models Obey Locality?Comments: 12 pages, accepted at NeurIPS'23 XAI WorkshopSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Concept-based learning improves a deep learning model's interpretability by explaining its predictions via human-understandable concepts. Deep learning models trained under this paradigm heavily rely on the assumption that neural networks can learn to predict the presence or absence of a given concept independently of other concepts. Recent work, however, strongly suggests that this assumption may fail to hold in Concept Bottleneck Models (CBMs), a quintessential family of concept-based interpretable architectures. In this paper, we investigate whether CBMs correctly capture the degree of conditional independence across concepts when such concepts are localised both spatially, by having their values entirely defined by a fixed subset of features, and semantically, by having their values correlated with only a fixed subset of predefined concepts. To understand locality, we analyse how changes to features outside of a concept's spatial or semantic locality impact concept predictions. Our results suggest that even in well-defined scenarios where the presence of a concept is localised to a fixed feature subspace, or whose semantics are correlated to a small subset of other concepts, CBMs fail to learn this locality. These results cast doubt upon the quality of concept representations learnt by CBMs and strongly suggest that concept-based explanations may be fragile to changes outside their localities.
- [312] arXiv:2401.01262 (cross-list from cs.CL) [ pdf , other ]
-
Title: Fairness Certification for Natural Language Processing and Large Language ModelsComments: In depth discussion of our results can be found in the Appendix BSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Natural Language Processing (NLP) plays an important role in our daily lives, particularly due to the enormous progress of Large Language Models (LLM). However, NLP has many fairness-critical use cases, e.g., as an expert system in recruitment or as an LLM-based tutor in education. Since NLP is based on human language, potentially harmful biases can diffuse into NLP systems and produce unfair results, discriminate against minorities or generate legal issues. Hence, it is important to develop a fairness certification for NLP approaches. We follow a qualitative research approach towards a fairness certification for NLP. In particular, we have reviewed a large body of literature on algorithmic fairness, and we have conducted semi-structured expert interviews with a wide range of experts from that area. We have systematically devised six fairness criteria for NLP, which can be further refined into 18 sub-categories. Our criteria offer a foundation for operationalizing and testing processes to certify fairness, both from the perspective of the auditor and the audited organization.
- [313] arXiv:2401.01265 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Optimal Synthesis of Finite State Machines with Universal Gates using Evolutionary AlgorithmSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: This work presents an optimization method for the synthesis of finite state machines. The focus is on the reduction in the on-chip area and the cost of the circuit. A list of finite state machines from MCNC91 benchmark circuits have been evolved using Cartesian Genetic Programming. On the average, almost 30% of reduction in the total number of gates has been achieved. The effects of some parameters on the evolutionary process have also been discussed in the paper.
- [314] arXiv:2401.01269 (cross-list from cs.CR) [ pdf , other ]
-
Title: LLbezpeky: Leveraging Large Language Models for Vulnerability DetectionComments: This project report was presented as a part of the course CS858 at the University of Waterloo under the supervision of Prof. Yousra AaferSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: Despite the continued research and progress in building secure systems, Android applications continue to be ridden with vulnerabilities, necessitating effective detection methods. Current strategies involving static and dynamic analysis tools come with limitations like overwhelming number of false positives and limited scope of analysis which make either difficult to adopt. Over the past years, machine learning based approaches have been extensively explored for vulnerability detection, but its real-world applicability is constrained by data requirements and feature engineering challenges. Large Language Models (LLMs), with their vast parameters, have shown tremendous potential in understanding semnatics in human as well as programming languages. We dive into the efficacy of LLMs for detecting vulnerabilities in the context of Android security. We focus on building an AI-driven workflow to assist developers in identifying and rectifying vulnerabilities. Our experiments show that LLMs outperform our expectations in finding issues within applications correctly flagging insecure apps in 91.67% of cases in the Ghera benchmark. We use inferences from our experiments towards building a robust and actionable vulnerability detection system and demonstrate its effectiveness. Our experiments also shed light on how different various simple configurations can affect the True Positive (TP) and False Positive (FP) rates.
- [315] arXiv:2401.01286 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Comprehensive Study of Knowledge Editing for Large Language ModelsAuthors: Ningyu Zhang , Yunzhi Yao , Bozhong Tian , Peng Wang , Shumin Deng , Mengru Wang , Zekun Xi , Shengyu Mao , Jintian Zhang , Yuansheng Ni , Siyuan Cheng , Ziwen Xu , Xin Xu , Jia-Chen Gu , Yong Jiang , Pengjun Xie , Fei Huang , Lei Liang , Zhiqiang Zhang , Xiaowei Zhu , Jun Zhou , Huajun ChenComments: Ongoing work; 52 pages, 282 citations; benchmark is available at this https URL code is available at this https URL paper list is available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.
- [316] arXiv:2401.01288 (cross-list from cs.IT) [ pdf , other ]
-
Title: Physics-informed Generalizable Wireless Channel Modeling with Segmentation and Deep Learning: Fundamentals, Methodologies, and ChallengesComments: Submitted to IEEE Magazine for potential future publicationsSubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Channel modeling is fundamental in advancing wireless systems and has thus attracted considerable research focus. Recent trends have seen a growing reliance on data-driven techniques to facilitate the modeling process and yield accurate channel predictions. In this work, we first provide a concise overview of data-driven channel modeling methods, highlighting their limitations. Subsequently, we introduce the concept and advantages of physics-informed neural network (PINN)-based modeling and a summary of recent contributions in this area. Our findings demonstrate that PINN-based approaches in channel modeling exhibit promising attributes such as generalizability, interpretability, and robustness. We offer a comprehensive architecture for PINN methodology, designed to inform and inspire future model development. A case-study of our recent work on precise indoor channel prediction with semantic segmentation and deep learning is presented. The study concludes by addressing the challenges faced and suggesting potential research directions in this field.
- [317] arXiv:2401.01301 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large Legal Fictions: Profiling Legal Hallucinations in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Large language models (LLMs) have the potential to transform the practice of law, but this potential is threatened by the presence of legal hallucinations -- responses from these models that are not consistent with legal facts. We investigate the extent of these hallucinations using an original suite of legal queries, comparing LLMs' responses to structured legal metadata and examining their consistency. Our work makes four key contributions: (1) We develop a typology of legal hallucinations, providing a conceptual framework for future research in this area. (2) We find that legal hallucinations are alarmingly prevalent, occurring between 69% of the time with ChatGPT 3.5 and 88% with Llama 2, when these models are asked specific, verifiable questions about random federal court cases. (3) We illustrate that LLMs often fail to correct a user's incorrect legal assumptions in a contra-factual question setup. (4) We provide evidence that LLMs cannot always predict, or do not always know, when they are producing legal hallucinations. Taken together, these findings caution against the rapid and unsupervised integration of popular LLMs into legal tasks. Even experienced lawyers must remain wary of legal hallucinations, and the risks are highest for those who stand to benefit from LLMs the most -- pro se litigants or those without access to traditional legal resources.
- [318] arXiv:2401.01304 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Experimental Validation of Sensor Fusion-based GNSS Spoofing Attack Detection Framework for Autonomous VehiclesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we validate the performance of the a sensor fusion-based Global Navigation Satellite System (GNSS) spoofing attack detection framework for Autonomous Vehicles (AVs). To collect data, a vehicle equipped with a GNSS receiver, along with Inertial Measurement Unit (IMU) is used. The detection framework incorporates two strategies: The first strategy involves comparing the predicted location shift, which is the distance traveled between two consecutive timestamps, with the inertial sensor-based location shift. For this purpose, data from low-cost in-vehicle inertial sensors such as the accelerometer and gyroscope sensor are fused and fed into a long short-term memory (LSTM) neural network. The second strategy employs a Random-Forest supervised machine learning model to detect and classify turns, distinguishing between left and right turns using the output from the steering angle sensor. In experiments, two types of spoofing attack models: turn-by-turn and wrong turn are simulated. These spoofing attacks are modeled as SQL injection attacks, where, upon successful implementation, the navigation system perceives injected spoofed location information as legitimate while being unable to detect legitimate GNSS signals. Importantly, the IMU data remains uncompromised throughout the spoofing attack. To test the effectiveness of the detection framework, experiments are conducted in Tuscaloosa, AL, mimicking urban road structures. The results demonstrate the framework's ability to detect various sophisticated GNSS spoofing attacks, even including slow position drifting attacks. Overall, the experimental results showcase the robustness and efficacy of the sensor fusion-based spoofing attack detection approach in safeguarding AVs against GNSS spoofing threats.
- [319] arXiv:2401.01325 (cross-list from cs.CL) [ pdf , other ]
-
Title: LLM Maybe LongLM: Self-Extend LLM Context Window Without TuningAuthors: Hongye Jin , Xiaotian Han , Jingfeng Yang , Zhimeng Jiang , Zirui Liu , Chia-Yuan Chang , Huiyuan Chen , Xia HuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information: the grouped attention and the neighbor attention. The grouped attention captures the dependencies among tokens that are far apart, while neighbor attention captures dependencies among adjacent tokens within a specified range. The two-level attentions are computed based on the original model's self-attention mechanism during inference. With minor code modification, our SelfExtend can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length. The code can be found at \url{ this https URL }.
- [320] arXiv:2401.01326 (cross-list from cs.CL) [ pdf , other ]
-
Title: An Autoregressive Text-to-Graph Framework for Joint Entity and Relation ExtractionComments: AAAI 2024 (camera ready version)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this paper, we propose a novel method for joint entity and relation extraction from unstructured text by framing it as a conditional sequence generation problem. In contrast to conventional generative information extraction models that are left-to-right token-level generators, our approach is \textit{span-based}. It generates a linearized graph where nodes represent text spans and edges represent relation triplets. Our method employs a transformer encoder-decoder architecture with pointing mechanism on a dynamic vocabulary of spans and relation types. Our model can capture the structural characteristics and boundaries of entities and relations through span representations while simultaneously grounding the generated output in the original text thanks to the pointing mechanism. Evaluation on benchmark datasets validates the effectiveness of our approach, demonstrating competitive results. Code is available at this https URL .
- [321] arXiv:2401.01330 (cross-list from cs.IR) [ pdf , other ]
-
Title: TREC iKAT 2023: The Interactive Knowledge Assistance Track OverviewAuthors: Mohammad Aliannejadi , Zahra Abbasiantaeb , Shubham Chatterjee , Jeffery Dalton , Leif AzzopardiComments: TREC iKAT 2023 Overview PaperSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Conversational Information Seeking has evolved rapidly in the last few years with the development of Large Language Models providing the basis for interpreting and responding in a naturalistic manner to user requests. iKAT emphasizes the creation and research of conversational search agents that adapt responses based on the user's prior interactions and present context. This means that the same question might yield varied answers, contingent on the user's profile and preferences. The challenge lies in enabling Conversational Search Agents (CSA) to incorporate personalized context to effectively guide users through the relevant information to them. iKAT's first year attracted seven teams and a total of 24 runs. Most of the runs leveraged Large Language Models (LLMs) in their pipelines, with a few focusing on a generate-then-retrieve approach.
- [322] arXiv:2401.01335 (cross-list from cs.LG) [ pdf , other ]
-
Title: Self-Play Fine-Tuning Converts Weak Language Models to Strong Language ModelsComments: 28 pages, 6 figures, 6 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Abstract: Harnessing the power of human-annotated data through Supervised Fine-Tuning (SFT) is pivotal for advancing Large Language Models (LLMs). In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data. We propose a new fine-tuning method called Self-Play fIne-tuNing (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data. Our method progressively elevates the LLM from a nascent model to a formidable one, unlocking the full potential of human-annotated demonstration data for SFT. Theoretically, we prove that the global optimum to the training objective function of our method is achieved only when the LLM policy aligns with the target data distribution. Empirically, we evaluate our method on several benchmark datasets including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench. Our results show that SPIN can significantly improve the LLM's performance across a variety of benchmarks and even outperform models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data. This sheds light on the promise of self-play, enabling the achievement of human-level performance in LLMs without the need for expert opponents. Codes are available at this https URL .
- [323] arXiv:2401.01343 (cross-list from cs.CR) [ pdf , other ]
-
Title: IoTGeM: Generalizable Models for Behaviour-Based IoT Attack DetectionComments: 25 pages (13 main, 12 supplementary appendix), 20 figures, 14 tablesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract: Previous research on behaviour-based attack detection on networks of IoT devices has resulted in machine learning models whose ability to adapt to unseen data is limited, and often not demonstrated. In this paper we present an approach for modelling IoT network attacks that focuses on generalizability, yet also leads to better detection and performance. First, we present an improved rolling window approach for feature extraction, and introduce a multi-step feature selection process that reduces overfitting. Second, we build and test models using isolated train and test datasets, thereby avoiding common data leaks that have limited the generalizability of previous models. Third, we rigorously evaluate our methodology using a diverse portfolio of machine learning models, evaluation metrics and datasets. Finally, we build confidence in the models by using explainable AI techniques, allowing us to identify the features that underlie accurate detection of attacks.
- [324] arXiv:2401.01369 (cross-list from cs.IR) [ pdf , other ]
-
Title: RL-MPCA: A Reinforcement Learning Based Multi-Phase Computation Allocation Approach for Recommender SystemsAuthors: Jiahong Zhou , Shunhui Mao , Guoliang Yang , Bo Tang , Qianlong Xie , Lebin Lin , Xingxing Wang , Dong WangComments: 11 pages, 7 figures, published to Proceedings of the ACM Web Conference 2023Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recommender systems aim to recommend the most suitable items to users from a large number of candidates. Their computation cost grows as the number of user requests and the complexity of services (or models) increases. Under the limitation of computation resources (CRs), how to make a trade-off between computation cost and business revenue becomes an essential question. The existing studies focus on dynamically allocating CRs in queue truncation scenarios (i.e., allocating the size of candidates), and formulate the CR allocation problem as an optimization problem with constraints. Some of them focus on single-phase CR allocation, and others focus on multi-phase CR allocation but introduce some assumptions about queue truncation scenarios. However, these assumptions do not hold in other scenarios, such as retrieval channel selection and prediction model selection. Moreover, existing studies ignore the state transition process of requests between different phases, limiting the effectiveness of their approaches.
This paper proposes a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA), which aims to maximize the total business revenue under the limitation of CRs. RL-MPCA formulates the CR allocation problem as a Weakly Coupled MDP problem and solves it with an RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to adapt to various CR allocation scenarios, and calibrates the Q-value by introducing multiple adaptive Lagrange multipliers (adaptive-$\lambda$) to avoid violating the global CR constraints. Finally, experiments on the offline simulation environment and online real-world recommender system validate the effectiveness of our approach. - [325] arXiv:2401.01373 (cross-list from cs.CV) [ pdf , other ]
-
Title: Boosting Defect Detection in Manufacturing using Tensor Convolutional Neural NetworksComments: 12 pages, 4 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantum Physics (quant-ph)
Abstract: Defect detection is one of the most important yet challenging tasks in the quality control stage in the manufacturing sector. In this work, we introduce a Tensor Convolutional Neural Network (T-CNN) and examine its performance on a real defect detection application in one of the components of the ultrasonic sensors produced at Robert Bosch's manufacturing plants. Our quantum-inspired T-CNN operates on a reduced model parameter space to substantially improve the training speed and performance of an equivalent CNN model without sacrificing accuracy. More specifically, we demonstrate how T-CNNs are able to reach the same performance as classical CNNs as measured by quality metrics, with up to fifteen times fewer parameters and 4% to 19% faster training times. Our results demonstrate that the T-CNN greatly outperforms the results of traditional human visual inspection, providing value in a current real application in manufacturing.
- [326] arXiv:2401.01377 (cross-list from cs.CR) [ pdf , other ]
-
Title: Does Few-shot Learning Suffer from Backdoor Attacks?Comments: AAAI2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The field of few-shot learning (FSL) has shown promising results in scenarios where training data is limited, but its vulnerability to backdoor attacks remains largely unexplored. We first explore this topic by first evaluating the performance of the existing backdoor attack methods on few-shot learning scenarios. Unlike in standard supervised learning, existing backdoor attack methods failed to perform an effective attack in FSL due to two main issues. Firstly, the model tends to overfit to either benign features or trigger features, causing a tough trade-off between attack success rate and benign accuracy. Secondly, due to the small number of training samples, the dirty label or visible trigger in the support set can be easily detected by victims, which reduces the stealthiness of attacks. It seemed that FSL could survive from backdoor attacks. However, in this paper, we propose the Few-shot Learning Backdoor Attack (FLBA) to show that FSL can still be vulnerable to backdoor attacks. Specifically, we first generate a trigger to maximize the gap between poisoned and benign features. It enables the model to learn both benign and trigger features, which solves the problem of overfitting. To make it more stealthy, we hide the trigger by optimizing two types of imperceptible perturbation, namely attractive and repulsive perturbation, instead of attaching the trigger directly. Once we obtain the perturbations, we can poison all samples in the benign support set into a hidden poisoned support set and fine-tune the model on it. Our method demonstrates a high Attack Success Rate (ASR) in FSL tasks with different few-shot learning paradigms while preserving clean accuracy and maintaining stealthiness. This study reveals that few-shot learning still suffers from backdoor attacks, and its security should be given attention.
- [327] arXiv:2401.01384 (cross-list from cs.SI) [ pdf , other ]
-
Title: Strong Transitivity Relations and Graph Neural NetworksSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Local neighborhoods play a crucial role in embedding generation in graph-based learning. It is commonly believed that nodes ought to have embeddings that resemble those of their neighbors. In this research, we try to carefully expand the concept of similarity from nearby neighborhoods to the entire graph. We provide an extension of similarity that is based on transitivity relations, which enables Graph Neural Networks (GNNs) to capture both global similarities and local similarities over the whole graph. We introduce Transitivity Graph Neural Network (TransGNN), which more than local node similarities, takes into account global similarities by distinguishing strong transitivity relations from weak ones and exploiting them. We evaluate our model over several real-world datasets and showed that it considerably improves the performance of several well-known GNN models, for tasks such as node classification.
- [328] arXiv:2401.01388 (cross-list from cs.CV) [ pdf , other ]
-
Title: Directional Antenna Systems for Long-Range Through-Wall Human Activity RecognitionComments: arXiv admin note: text overlap with arXiv:2401.00964Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: WiFi Channel State Information (CSI)-based human activity recognition (HAR) enables contactless, long-range sensing in spatially constrained environments while preserving visual privacy. However, despite the presence of numerous WiFi-enabled devices around us, few expose CSI to users, resulting in a lack of sensing hardware options. Variants of the Espressif ESP32 have emerged as potential low-cost and easy-to-deploy solutions for WiFi CSI-based HAR. In this work, four ESP32-S3-based 2.4GHz directional antenna systems are evaluated for their ability to facilitate long-range through-wall HAR. Two promising systems are proposed, one of which combines the ESP32-S3 with a directional biquad antenna. This combination represents, to the best of our knowledge, the first demonstration of such a system in WiFi-based HAR. The second system relies on the built-in printed inverted-F antenna (PIFA) of the ESP32-S3 and achieves directionality through a plane reflector. In a comprehensive evaluation of line-of-sight (LOS) and non-line-of-sight (NLOS) HAR performance, both systems are deployed in an office environment spanning a distance of 18 meters across five rooms. In this experimental setup, the Wallhack1.8k dataset, comprising 1806 CSI amplitude spectrograms of human activities, is collected and made publicly available. Based on Wallhack1.8k, we train activity recognition models using the EfficientNetV2 architecture to assess system performance in LOS and NLOS scenarios. For the core NLOS activity recognition problem, the biquad antenna and PIFA-based systems achieve accuracies of 92.0$\pm$3.5 and 86.8$\pm$4.7, respectively, demonstrating the feasibility of long-range through-wall HAR with the proposed systems.
- [329] arXiv:2401.01405 (cross-list from cs.CL) [ pdf , other ]
-
Title: Quantifying the Uniqueness of Donald Trump in Presidential DiscourseAuthors: Karen Zhou , Alexander A. Meitus , Milo Chase , Grace Wang , Anne Mykland , William Howell , Chenhao TanSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
Abstract: Does Donald Trump speak differently from other presidents? If so, in what ways? Are these differences confined to any single medium of communication? To investigate these questions, this paper introduces a novel metric of uniqueness based on large language models, develops a new lexicon for divisive speech, and presents a framework for comparing the lexical features of political opponents. Applying these tools to a variety of corpora of presidential speeches, we find considerable evidence that Trump's speech patterns diverge from those of all major party nominees for the presidency in recent history. Some notable findings include Trump's employment of particularly divisive and antagonistic language targeting of his political opponents and his patterns of repetition for emphasis. Furthermore, Trump is significantly more distinctive than his fellow Republicans, whose uniqueness values are comparably closer to those of the Democrats. These differences hold across a variety of measurement strategies, arise on both the campaign trail and in official presidential addresses, and do not appear to be an artifact of secular time trends.
- [330] arXiv:2401.01426 (cross-list from cs.LG) [ pdf , other ]
-
Title: Modular Learning of Deep Causal Generative Models for High-dimensional Causal InferenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: Pearl's causal hierarchy establishes a clear separation between observational, interventional, and counterfactual questions. Researchers proposed sound and complete algorithms to compute identifiable causal queries at a given level of the hierarchy using the causal structure and data from the lower levels of the hierarchy. However, most of these algorithms assume that we can accurately estimate the probability distribution of the data, which is an impractical assumption for high-dimensional variables such as images. On the other hand, modern generative deep learning architectures can be trained to learn how to accurately sample from such high-dimensional distributions. Especially with the recent rise of foundation models for images, it is desirable to leverage pre-trained models to answer causal queries with such high-dimensional data. To address this, we propose a sequential training algorithm that, given the causal structure and a pre-trained conditional generative model, can train a deep causal generative model, which utilizes the pre-trained model and can provably sample from identifiable interventional and counterfactual distributions. Our algorithm, called Modular-DCM, uses adversarial training to learn the network weights, and to the best of our knowledge, is the first algorithm that can make use of pre-trained models and provably sample from any identifiable causal query in the presence of latent confounders with high-dimensional data. We demonstrate the utility of our algorithm using semi-synthetic and real-world datasets containing images as variables in the causal structure.
- [331] arXiv:2401.01442 (cross-list from cs.IT) [ pdf , other ]
-
Title: Hierarchical Over-the-Air Federated Learning with Awareness of Interference and Data HeterogeneityComments: To appear at IEEE WCNC 2024. Overlap with arXiv:2211.16162Subjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: When implementing hierarchical federated learning over wireless networks, scalability assurance and the ability to handle both interference and device data heterogeneity are crucial. This work introduces a learning method designed to address these challenges, along with a scalable transmission scheme that efficiently uses a single wireless resource through over-the-air computation. To provide resistance against data heterogeneity, we employ gradient aggregations. Meanwhile, the impact of interference is minimized through optimized receiver normalizing factors. For this, we model a multi-cluster wireless network using stochastic geometry, and characterize the mean squared error of the aggregation estimations as a function of the network parameters. We show that despite the interference and the data heterogeneity, the proposed scheme achieves high learning accuracy and can significantly outperform the conventional hierarchical algorithm.
- [332] arXiv:2401.01458 (cross-list from cs.LG) [ pdf , other ]
-
Title: Concurrent Self-testing of Neural Networks Using Uncertainty FingerprintSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET)
Abstract: Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal effects. Explicit NN-HA testing methods cannot detect transient faults during inference, are unsuitable for always-on applications, and require extensive test vector generation and storage. Therefore, in this paper, we propose the \emph{uncertainty fingerprint} approach representing the online fault status of NN. Furthermore, we propose a dual head NN topology specifically designed to produce uncertainty fingerprints and the primary prediction of the NN in \emph{a single shot}. During the online operation, by matching the uncertainty fingerprint, we can concurrently self-test NNs with up to $100\%$ coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to $243.7$ MB, multiply and accumulate (MAC) operation is reduced by up to $10000\times$, and false-positive rates are reduced by up to $89\%$.
- [333] arXiv:2401.01469 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Question-Answering Based Summarization of Electronic Health Records using Retrieval Augmented GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Summarization of electronic health records (EHRs) can substantially minimize 'screen time' for both patients as well as medical personnel. In recent years summarization of EHRs have employed machine learning pipelines using state of the art neural models. However, these models have produced less than adequate results that are attributed to the difficulty of obtaining sufficient annotated data for training. Moreover, the requirement to consider the entire content of an EHR in summarization has resulted in poor performance due to the fact that attention mechanisms in modern large language models (LLMs) adds a quadratic complexity in terms of the size of the input. We propose here a method that mitigates these shortcomings by combining semantic search, retrieval augmented generation (RAG) and question-answering using the latest LLMs. In our approach summarization is the extraction of answers to specific questions that are deemed important by subject-matter experts (SMEs). Our approach is quite efficient; requires minimal to no training; does not suffer from the 'hallucination' problem of LLMs; and it ensures diversity, since the summary will not have repeated content but diverse answers to specific questions.
- [334] arXiv:2401.01470 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: TPC-ViT: Token Propagation Controller for Efficient Vision TransformerAuthors: Wentao ZhuComments: Accepted by the main conference of WACV 2024; well-formatted PDF is in this https URL ; supplementary is in this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE)
Abstract: Vision transformers (ViTs) have achieved promising results on a variety of Computer Vision tasks, however their quadratic complexity in the number of input tokens has limited their application specially in resource-constrained settings. Previous approaches that employ gradual token reduction to address this challenge assume that token redundancy in one layer implies redundancy in all the following layers. We empirically demonstrate that this assumption is often not correct, i.e., tokens that are redundant in one layer can be useful in later layers. We employ this key insight to propose a novel token propagation controller (TPC) that incorporates two different token-distributions, i.e., pause probability and restart probability to control the reduction and reuse of tokens respectively, which results in more efficient token utilization. To improve the estimates of token distributions, we propose a smoothing mechanism that acts as a regularizer and helps remove noisy outliers. Furthermore, to improve the training-stability of our proposed TPC, we introduce a model stabilizer that is able to implicitly encode local image structures and minimize accuracy fluctuations during model training. We present extensive experimental results on the ImageNet-1K dataset using DeiT, LV-ViT and Swin models to demonstrate the effectiveness of our proposed method. For example, compared to baseline models, our proposed method improves the inference speed of the DeiT-S by 250% while increasing the classification accuracy by 1.0%.
- [335] arXiv:2401.01482 (cross-list from cs.CV) [ pdf , other ]
-
Title: Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object RecognitionComments: To appear in IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Existing object recognition models have been shown to lack robustness in diverse geographical scenarios due to domain shifts in design and context. Class representations need to be adapted to more accurately reflect an object concept under these shifts. In the absence of training data from target geographies, we hypothesize that geographically diverse descriptive knowledge of categories can enhance robustness. For this purpose, we explore the feasibility of probing a large language model for geography-based object knowledge, and we examine the effects of integrating knowledge into zero-shot and learnable soft prompting with CLIP. Within this exploration, we propose geography knowledge regularization to ensure that soft prompts trained on a source set of geographies generalize to an unseen target set. Accuracy gains over prompting baselines on DollarStreet while training only on Europe data are up to +2.8/1.2/1.6 on target data from Africa/Asia/Americas, and +4.6 overall on the hardest classes. Competitive performance is shown vs. few-shot target training, and analysis is provided to direct future study of geographical robustness.
- [336] arXiv:2401.01484 (cross-list from cs.LG) [ pdf , other ]
-
Title: Uncertainty Regularized Evidential RegressionComments: Accepted to AAAI 2024 main trackSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory to predict a target and quantify the associated uncertainty. Guided by the underlying theory, specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance by limiting its ability to learn from all samples. This paper provides a theoretical analysis of this limitation and introduces an improvement to overcome it. Initially, we define the region where the models can't effectively learn from the samples. Following this, we thoroughly analyze the ERN and investigate this constraint. Leveraging the insights from our analysis, we address the limitation by introducing a novel regularization term that empowers the ERN to learn from the whole training set. Our extensive experiments substantiate our theoretical findings and demonstrate the effectiveness of the proposed solution.
- [337] arXiv:2401.01493 (cross-list from cs.LG) [ pdf , other ]
-
Title: Free Lunch for Federated Remote Sensing Target Fine-Grained Classification: A Parameter-Efficient FrameworkComments: Under Review, 23 pages, 3 figures, 12 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Remote Sensing Target Fine-grained Classification (TFGC) is of great significance in both military and civilian fields. Due to location differences, growth in data size, and centralized server storage constraints, these data are usually stored under different databases across regions/countries. However, privacy laws and national security concerns constrain researchers from accessing these sensitive remote sensing images for further analysis. Additionally, low-resource remote sensing devices encounter challenges in terms of communication overhead and efficiency when dealing with the ever-increasing data and model scales. To solve the above challenges, this paper proposes a novel Privacy-Reserving TFGC Framework based on Federated Learning, dubbed PRFL. The proposed framework allows each client to learn global and local knowledge to enhance the local representation of private data in environments with extreme statistical heterogeneity (non. Independent and Identically Distributed, IID). Thus, it provides highly customized models to clients with differentiated data distributions. Moreover, the framework minimizes communication overhead and improves efficiency while ensuring satisfactory performance, thereby enhancing robustness and practical applicability under resource-scarce conditions. We demonstrate the effectiveness of the proposed PRFL on the classical TFGC task by leveraging four public datasets.
- [338] arXiv:2401.01519 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive ReviewAuthors: Luoma Ke (1), Song Tong (1), Peng Cheng (2), Kaiping Peng (1) ((1) Department of Psychology, Tsinghua University, (2) School of Social Science, Tsinghua University)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper explores the frontiers of large language models (LLMs) in psychology applications. Psychology has undergone several theoretical changes, and the current use of Artificial Intelligence (AI) and Machine Learning, particularly LLMs, promises to open up new research directions. We provide a detailed exploration of how LLMs like ChatGPT are transforming psychological research. It discusses the impact of LLMs across various branches of psychology, including cognitive and behavioral, clinical and counseling, educational and developmental, and social and cultural psychology, highlighting their potential to simulate aspects of human cognition and behavior. The paper delves into the capabilities of these models to emulate human-like text generation, offering innovative tools for literature review, hypothesis generation, experimental design, experimental subjects, data analysis, academic writing, and peer review in psychology. While LLMs are essential in advancing research methodologies in psychology, the paper also cautions about their technical and ethical challenges. There are issues like data privacy, the ethical implications of using LLMs in psychological research, and the need for a deeper understanding of these models' limitations. Researchers should responsibly use LLMs in psychological studies, adhering to ethical standards and considering the potential consequences of deploying these technologies in sensitive areas. Overall, the article provides a comprehensive overview of the current state of LLMs in psychology, exploring potential benefits and challenges. It serves as a call to action for researchers to leverage LLMs' advantages responsibly while addressing associated risks.
- [339] arXiv:2401.01523 (cross-list from cs.CL) [ pdf , other ]
-
Title: GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social AbuseComments: The first work to benchmark Large Multimodal Models in safety insight on social mediaSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and imagery. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs (e.g., GPT-4V) to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence. The GOAT-Bench and accompanying resources are publicly accessible at this https URL , contributing to ongoing research in this vital field.
- [340] arXiv:2401.01537 (cross-list from cs.CR) [ pdf , other ]
-
Title: The Art of Deception: Robust Backdoor Attack using Dynamic Stacking of TriggersAuthors: Orson MengaraComments: Accepted by AAAI Workshop 2024, 8 pagesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The area of Machine Learning as a Service (MLaaS) is experiencing increased implementation due to recent advancements in the AI (Artificial Intelligence) industry. However, this spike has prompted concerns regarding AI defense mechanisms, specifically regarding potential covert attacks from third-party providers that cannot be entirely trusted. Recent research has uncovered that auditory backdoors may use certain modifications as their initiating mechanism. DynamicTrigger is introduced as a methodology for carrying out dynamic backdoor attacks that use cleverly designed tweaks to ensure that corrupted samples are indistinguishable from clean. By utilizing fluctuating signal sampling rates and masking speaker identities through dynamic sound triggers (such as the clapping of hands), it is possible to deceive speech recognition systems (ASR). Our empirical testing demonstrates that DynamicTrigger is both potent and stealthy, achieving impressive success rates during covert attacks while maintaining exceptional accuracy with non-poisoned datasets.
- [341] arXiv:2401.01542 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: Adversarial Machine Learning-Enabled Anonymization of OpenWiFi DataAuthors: Samhita Kuili , Kareem Dabbour , Irtiza Hasan , Andrea Herscovich , Burak Kantarci , Marcel Chenier , Melike Erol-KantarciComments: 8 pages, 4 Figures, "Wireless World Research and Trends" Magazine. Initial version was presented in 47th Wireless World Research ForumSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: Data privacy and protection through anonymization is a critical issue for network operators or data owners before it is forwarded for other possible use of data. With the adoption of Artificial Intelligence (AI), data anonymization augments the likelihood of covering up necessary sensitive information; preventing data leakage and information loss. OpenWiFi networks are vulnerable to any adversary who is trying to gain access or knowledge on traffic regardless of the knowledge possessed by data owners. The odds for discovery of actual traffic information is addressed by applied conditional tabular generative adversarial network (CTGAN). CTGAN yields synthetic data; which disguises as actual data but fostering hidden acute information of actual data. In this paper, the similarity assessment of synthetic with actual data is showcased in terms of clustering algorithms followed by a comparison of performance for unsupervised cluster validation metrics. A well-known algorithm, K-means outperforms other algorithms in terms of similarity assessment of synthetic data over real data while achieving nearest scores 0.634, 23714.57, and 0.598 as Silhouette, Calinski and Harabasz and Davies Bouldin metric respectively. On exploiting a comparative analysis in validation scores among several algorithms, K-means forms the epitome of unsupervised clustering algorithms ensuring explicit usage of synthetic data at the same time a replacement for real data. Hence, the experimental results aim to show the viability of using CTGAN-generated synthetic data in lieu of publishing anonymized data to be utilized in various applications.
- [342] arXiv:2401.01600 (cross-list from cs.CL) [ pdf , other ]
-
Title: PLLaMa: An Open-source Large Language Model for Plant ScienceComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have exhibited remarkable capabilities in understanding and interacting with natural language across various sectors. However, their effectiveness is limited in specialized areas requiring high accuracy, such as plant science, due to a lack of specific expertise in these fields. This paper introduces PLLaMa, an open-source language model that evolved from LLaMa-2. It's enhanced with a comprehensive database, comprising more than 1.5 million scholarly articles in plant science. This development significantly enriches PLLaMa with extensive knowledge and proficiency in plant and agricultural sciences. Our initial tests, involving specific datasets related to plants and agriculture, show that PLLaMa substantially improves its understanding of plant science-related topics. Moreover, we have formed an international panel of professionals, including plant scientists, agricultural engineers, and plant breeders. This team plays a crucial role in verifying the accuracy of PLLaMa's responses to various academic inquiries, ensuring its effective and reliable application in the field. To support further research and development, we have made the model's checkpoints and source codes accessible to the scientific community. These resources are available for download at \url{ this https URL }.
- [343] arXiv:2401.01614 (cross-list from cs.IR) [ pdf , other ]
-
Title: GPT-4V(ision) is a Generalist Web Agent, if GroundedSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering. In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website. We propose SEEACT, a generalist web agent that harnesses the power of LMMs for integrated visual understanding and acting on the web. We evaluate on the recent MIND2WEB benchmark. In addition to standard offline evaluation on cached websites, we enable a new online evaluation setting by developing a tool that allows running web agents on live websites. We show that GPT-4V presents a great potential for web agents -- it can successfully complete 51.1 of the tasks on live websites if we manually ground its textual plans into actions on the websites. This substantially outperforms text-only LLMs like GPT-4 or smaller models (FLAN-T5 and BLIP-2) specifically fine-tuned for web agents. However, grounding still remains a major challenge. Existing LMM grounding strategies like set-of-mark prompting turns out to be not effective for web agents, and the best grounding strategy we develop in this paper leverages both the HTML structure and visuals. Yet, there is still a substantial gap with oracle grounding, leaving ample room for further improvement. All code, data, and evaluation tools are available at this https URL .
- [344] arXiv:2401.01626 (cross-list from cs.LG) [ src ]
-
Title: On the Expressive Power of Graph Neural NetworksComments: We felt that significantly more work was needed to improve the quality before it should be put out in its current state. No replacement is available at the moment or in the near futureSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The study of Graph Neural Networks has received considerable interest in the past few years. By extending deep learning to graph-structured data, GNNs can solve a diverse set of tasks in fields including social science, chemistry, and medicine. The development of GNN architectures has largely been focused on improving empirical performance on tasks like node or graph classification. However, a line of recent work has instead sought to find GNN architectures that have desirable theoretical properties - by studying their expressive power and designing architectures that maximize this expressiveness.
While there is no consensus on the best way to define the expressiveness of a GNN, it can be viewed from several well-motivated perspectives. Perhaps the most natural approach is to study the universal approximation properties of GNNs, much in the way that this has been studied extensively for MLPs. Another direction focuses on the extent to which GNNs can distinguish between different graph structures, relating this to the graph isomorphism test. Besides, a GNN's ability to compute graph properties such as graph moments has been suggested as another form of expressiveness. All of these different definitions are complementary and have yielded different recommendations for GNN architecture choices. In this paper, we would like to give an overview of the notion of "expressive power" of GNNs and provide some valuable insights regarding the design choices of GNNs. - [345] arXiv:2401.01629 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Synthetic Data in AI: Challenges, Applications, and Ethical ImplicationsAuthors: Shuang Hao , Wenfeng Han , Tao Jiang , Yiping Li , Haonan Wu , Chunlin Zhong , Zhangjun Zhou , He TangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: In the rapidly evolving field of artificial intelligence, the creation and utilization of synthetic datasets have become increasingly significant. This report delves into the multifaceted aspects of synthetic data, particularly emphasizing the challenges and potential biases these datasets may harbor. It explores the methodologies behind synthetic data generation, spanning traditional statistical models to advanced deep learning techniques, and examines their applications across diverse domains. The report also critically addresses the ethical considerations and legal implications associated with synthetic datasets, highlighting the urgent need for mechanisms to ensure fairness, mitigate biases, and uphold ethical standards in AI development.
- [346] arXiv:2401.01651 (cross-list from cs.CV) [ pdf , other ]
-
Title: AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AIComments: Accepted to BenchCouncil Transactions on Benchmarks, Standards and Evaluations (TBench)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The burgeoning field of Artificial Intelligence Generated Content (AIGC) is witnessing rapid advancements, particularly in video generation. This paper introduces AIGCBench, a pioneering comprehensive and scalable benchmark designed to evaluate a variety of video generation tasks, with a primary focus on Image-to-Video (I2V) generation. AIGCBench tackles the limitations of existing benchmarks, which suffer from a lack of diverse datasets, by including a varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models. To establish a unified evaluation framework for video generation tasks, our benchmark includes 11 metrics spanning four dimensions to assess algorithm performance. These dimensions are control-video alignment, motion effects, temporal consistency, and video quality. These metrics are both reference video-dependent and video-free, ensuring a comprehensive evaluation strategy. The evaluation standard proposed correlates well with human judgment, providing insights into the strengths and weaknesses of current I2V algorithms. The findings from our extensive experiments aim to stimulate further research and development in the I2V field. AIGCBench represents a significant step toward creating standardized benchmarks for the broader AIGC landscape, proposing an adaptable and equitable framework for future assessments of video generation tasks. We have open-sourced the dataset and evaluation code on the project website: this https URL .
- [347] arXiv:2401.01656 (cross-list from cs.GT) [ pdf , other ]
-
Title: Deep Automated Mechanism Design for Integrating Ad Auction and Allocation in FeedComments: 9 pages, 2 figures, PostingSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)
Abstract: E-commerce platforms usually present an ordered list, mixed with several organic items and an advertisement, in response to each user's page view request. This list, the outcome of ad auction and allocation processes, directly impacts the platform's ad revenue and gross merchandise volume (GMV). Specifically, the ad auction determines which ad is displayed and the corresponding payment, while the ad allocation decides the display positions of the advertisement and organic items. The prevalent methods of segregating the ad auction and allocation into two distinct stages face two problems: 1) Ad auction does not consider externalities, such as the influence of actual display position and context on ad Click-Through Rate (CTR); 2) The ad allocation, which utilizes the auction-winning ad's payment to determine the display position dynamically, fails to maintain incentive compatibility (IC) for the advertisement. For instance, in the auction stage employing the traditional Generalized Second Price (GSP) , even if the winning ad increases its bid, its payment remains unchanged. This implies that the advertisement cannot secure a better position and thus loses the opportunity to achieve higher utility in the subsequent ad allocation stage. Previous research often focused on one of the two stages, neglecting the two-stage problem, which may result in suboptimal outcomes...
- [348] arXiv:2401.01728 (cross-list from cs.LG) [ pdf , other ]
-
Title: Ravnest: Decentralized Asynchronous Training on Heterogeneous DevicesComments: 28 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Modern deep learning models, growing larger and more complex, have demonstrated exceptional generalization and accuracy due to training on huge datasets. This trend is expected to continue. However, the increasing size of these models poses challenges in training, as traditional centralized methods are limited by memory constraints at such scales. This paper proposes an asynchronous decentralized training paradigm for large modern deep learning models that harnesses the compute power of regular heterogeneous PCs with limited resources connected across the internet to achieve favourable performance metrics. Ravnest facilitates decentralized training by efficiently organizing compute nodes into clusters with similar data transfer rates and compute capabilities, without necessitating that each node hosts the entire model. These clusters engage in $\textit{Zero-Bubble Asynchronous Model Parallel}$ training, and a $\textit{Parallel Multi-Ring All-Reduce}$ method is employed to effectively execute global parameter averaging across all clusters. We have framed our asynchronous SGD loss function as a block structured optimization problem with delayed updates and derived an optimal convergence rate of $O\left(\frac{1}{\sqrt{K}}\right)$. We further discuss linear speedup with respect to the number of participating clusters and the bound on the staleness parameter.
- [349] arXiv:2401.01732 (cross-list from cs.LG) [ pdf , other ]
-
Title: Task and Explanation NetworkAuthors: Moshe SipperSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Explainability in deep networks has gained increased importance in recent years. We argue herein that an AI must be tasked not just with a task but also with an explanation of why said task was accomplished as such. We present a basic framework -- Task and Explanation Network (TENet) -- which fully integrates task completion and its explanation. We believe that the field of AI as a whole should insist -- quite emphatically -- on explainability.
- [350] arXiv:2401.01754 (cross-list from cs.SE) [ pdf , other ]
-
Title: Using AI/ML to Find and Remediate Enterprise Secrets in Code & Document Sharing PlatformsSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: We introduce a new challenge to the software development community: 1) leveraging AI to accurately detect and flag up secrets in code and on popular document sharing platforms that frequently used by developers, such as Confluence and 2) automatically remediating the detections (e.g. by suggesting password vault functionality). This is a challenging, and mostly unaddressed task. Existing methods leverage heuristics and regular expressions, that can be very noisy, and therefore increase toil on developers. The next step - modifying code itself - to automatically remediate a detection, is a complex task. We introduce two baseline AI models that have good detection performance and propose an automatic mechanism for remediating secrets found in code, opening up the study of this task to the wider community.
- [351] arXiv:2401.01755 (cross-list from cs.SD) [ pdf , other ]
-
Title: Incremental FastPitch: Chunk-based High Quality Text to SpeechComments: 5 pages, 4 figures, 1 tableSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we propose Incremental FastPitch, a novel FastPitch variant capable of incrementally producing high-quality Mel chunks by improving the architecture with chunk-based FFT blocks, training with receptive-field constrained chunk attention masks, and inference with fixed size past model states. Experimental results show that our proposal can produce speech quality comparable to the parallel FastPitch, with a significant lower latency that allows even lower response time for real-time speech applications.
- [352] arXiv:2401.01788 (cross-list from cs.LG) [ pdf , other ]
-
Title: Applications of machine learning and IoT for Outdoor Air Pollution Monitoring and Prediction: A Systematic Literature ReviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: According to the World Health Organization (WHO), air pollution kills seven million people every year. Outdoor air pollution is a major environmental health problem affecting low, middle, and high-income countries. In the past few years, the research community has explored IoT-enabled machine learning applications for outdoor air pollution prediction. The general objective of this paper is to systematically review applications of machine learning and Internet of Things (IoT) for outdoor air pollution prediction and the combination of monitoring sensors and input features used. Two research questions were formulated for this review. 1086 publications were collected in the initial PRISMA stage. After the screening and eligibility phases, 37 papers were selected for inclusion. A cost-based analysis was conducted on the findings to highlight high-cost monitoring, low-cost IoT and hybrid enabled prediction. Three methods of prediction were identified: time series, feature-based and spatio-temporal. This review's findings identify major limitations in applications found in the literature, namely lack of coverage, lack of diversity of data and lack of inclusion of context-specific features. This review proposes directions for future research and underlines practical implications in healthcare, urban planning, global synergy and smart cities.
- [353] arXiv:2401.01801 (cross-list from cs.LG) [ pdf , other ]
-
Title: A quatum inspired neural network for geometric modelingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
Abstract: By conceiving physical systems as 3D many-body point clouds, geometric graph neural networks (GNNs), such as SE(3)/E(3) equivalent GNNs, have showcased promising performance. In particular, their effective message-passing mechanics make them adept at modeling molecules and crystalline materials. However, current geometric GNNs only offer a mean-field approximation of the many-body system, encapsulated within two-body message passing, thus falling short in capturing intricate relationships within these geometric graphs. To address this limitation, tensor networks, widely employed by computational physics to handle manybody systems using high-order tensors, have been introduced. Nevertheless, integrating these tensorized networks into the message-passing framework of GNNs faces scalability and symmetry conservation (e.g., permutation and rotation) challenges. In response, we introduce an innovative equivariant Matrix Product State (MPS)-based message-passing strategy, through achieving an efficient implementation of the tensor contraction operation. Our method effectively models complex many-body relationships, suppressing mean-field approximations, and captures symmetries within geometric graphs. Importantly, it seamlessly replaces the standard message-passing and layer-aggregation modules intrinsic to geometric GNNs. We empirically validate the superior accuracy of our approach on benchmark tasks, including predicting classical Newton systems and quantum tensor Hamiltonian matrices. To our knowledge, our approach represents the inaugural utilization of parameterized geometric tensor networks.
- [354] arXiv:2401.01830 (cross-list from cs.CL) [ pdf , other ]
-
Title: Iterative Mask Filling: An Effective Text Augmentation Method Using Masked Language ModelingComments: Published in International Conference on Advanced Engineering, Technology and Applications (ICAETA 2023). The final version is available online at this https URLJournal-ref: Communications in Computer and Information Science, vol. 1983, 450-463, Springer, 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Data augmentation is an effective technique for improving the performance of machine learning models. However, it has not been explored as extensively in natural language processing (NLP) as it has in computer vision. In this paper, we propose a novel text augmentation method that leverages the Fill-Mask feature of the transformer-based BERT model. Our method involves iteratively masking words in a sentence and replacing them with language model predictions. We have tested our proposed method on various NLP tasks and found it to be effective in many cases. Our results are presented along with a comparison to existing augmentation methods. Experimental results show that our proposed method significantly improves performance, especially on topic classification datasets.
- [355] arXiv:2401.01835 (cross-list from cs.IT) [ pdf , ps , other ]
-
Title: Concurrent Brainstorming & Hypothesis Satisfying: An Iterative Framework for Enhanced Retrieval-Augmented Generation (R2CBR3H-SR)Authors: Arash ShahmansooriComments: 3 pages, 1 table, double column IEEE journal format paperSubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Addressing the complexity of comprehensive information retrieval, this study introduces an innovative, iterative retrieval-augmented generation system. Our approach uniquely integrates a vector-space driven re-ranking mechanism with concurrent brainstorming to expedite the retrieval of highly relevant documents, thereby streamlining the generation of potential queries. This sets the stage for our novel hybrid process, which synergistically combines hypothesis formulation with satisfying decision-making strategy to determine content adequacy, leveraging a chain of thought-based prompting technique. This unified hypothesize-satisfied phase intelligently distills information to ascertain whether user queries have been satisfactorily addressed. Upon reaching this criterion, the system refines its output into a concise representation, maximizing conceptual density with minimal verbosity. The iterative nature of the workflow enhances process efficiency and accuracy. Crucially, the concurrency within the brainstorming phase significantly accelerates recursive operations, facilitating rapid convergence to solution satisfaction. Compared to conventional methods, our system demonstrates a marked improvement in computational time and cost-effectiveness. This research advances the state-of-the-art in intelligent retrieval systems, setting a new benchmark for resource-efficient information extraction and abstraction in knowledge-intensive applications.
- [356] arXiv:2401.01843 (cross-list from cs.CL) [ pdf , other ]
-
Title: Investigating Semi-Supervised Learning Algorithms in Text DatasetsComments: Innovations in Intelligent Systems and Applications Conference (ASYU)Journal-ref: 2022 Innovations in Intelligent Systems and Applications Conference (ASYU), published in IEEE XploreSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Using large training datasets enhances the generalization capabilities of neural networks. Semi-supervised learning (SSL) is useful when there are few labeled data and a lot of unlabeled data. SSL methods that use data augmentation are most successful for image datasets. In contrast, texts do not have consistent augmentation methods as images. Consequently, methods that use augmentation are not as effective in text data as they are in image data. In this study, we compared SSL algorithms that do not require augmentation; these are self-training, co-training, tri-training, and tri-training with disagreement. In the experiments, we used 4 different text datasets for different tasks. We examined the algorithms from a variety of perspectives by asking experiment questions and suggested several improvements. Among the algorithms, tri-training with disagreement showed the closest performance to the Oracle; however, performance gap shows that new semi-supervised algorithms or improvements in existing methods are needed.
- [357] arXiv:2401.01851 (cross-list from cs.LG) [ pdf , other ]
-
Title: The Power of Training: How Different Neural Network Setups Influence the Energy DemandSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Performance (cs.PF)
Abstract: This work offers a heuristic evaluation of the effects of variations in machine learning training regimes and learning paradigms on the energy consumption of computing, especially HPC hardware with a life-cycle aware perspective. While increasing data availability and innovation in high-performance hardware fuels the training of sophisticated models, it also fosters the fading perception of energy consumption and carbon emission. Therefore, the goal of this work is to raise awareness about the energy impact of general training parameters and processes, from learning rate over batch size to knowledge transfer. Multiple setups with different hyperparameter configurations are evaluated on three different hardware systems. Among many results, we have found out that even with the same model and hardware to reach the same accuracy, improperly set training hyperparameters consume up to 5 times the energy of the optimal setup. We also extensively examined the energy-saving benefits of learning paradigms including recycling knowledge through pretraining and sharing knowledge through multitask training.
- [358] arXiv:2401.01854 (cross-list from cs.CL) [ pdf , other ]
-
Title: Multilingual Instruction Tuning With Just a Pinch of MultilingualitySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.
- [359] arXiv:2401.01868 (cross-list from cs.CV) [ pdf , other ]
-
Title: Step length measurement in the wild using FMCW radarAuthors: Parthipan Siva , Alexander Wong , Patricia Hewston , George Ioannidis , Dr. Jonathan Adachi , Dr. Alexander Rabinovich , Andrea Lee , Alexandra PapaioannouSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: With an aging population, numerous assistive and monitoring technologies are under development to enable older adults to age in place. To facilitate aging in place predicting risk factors such as falls, and hospitalization and providing early interventions are important. Much of the work on ambient monitoring for risk prediction has centered on gait speed analysis, utilizing privacy-preserving sensors like radar. Despite compelling evidence that monitoring step length, in addition to gait speed, is crucial for predicting risk, radar-based methods have not explored step length measurement in the home. Furthermore, laboratory experiments on step length measurement using radars are limited to proof of concept studies with few healthy subjects. To address this gap, a radar-based step length measurement system for the home is proposed based on detection and tracking using radar point cloud, followed by Doppler speed profiling of the torso to obtain step lengths in the home. The proposed method was evaluated in a clinical environment, involving 35 frail older adults, to establish its validity. Additionally, the method was assessed in people's homes, with 21 frail older adults who had participated in the clinical assessment. The proposed radar-based step length measurement method was compared to the gold standard Zeno Walkway Gait Analysis System, revealing a 4.5cm/8.3% error in a clinical setting. Furthermore, it exhibited excellent reliability (ICC(2,k)=0.91, 95% CI 0.82 to 0.96) in uncontrolled home settings. The method also proved accurate in uncontrolled home settings, as indicated by a strong agreement (ICC(3,k)=0.81 (95% CI 0.53 to 0.92)) between home measurements and in-clinic assessments.
- [360] arXiv:2401.01943 (cross-list from cs.CL) [ pdf , other ]
-
Title: Generalist embedding models are better at short-context clinical semantic search than specialized embedding modelsAuthors: Jean-Baptiste Excoffier , Tom Roehr , Alexei Figueroa , Jens-Michalis Papaioannou , Keno Bressem , Matthieu OrtalaComments: 11 pages, 1 figure, 5 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The increasing use of tools and solutions based on Large Language Models (LLMs) for various tasks in the medical domain has become a prominent trend. Their use in this highly critical and sensitive domain has thus raised important questions about their robustness, especially in response to variations in input, and the reliability of the generated outputs. This study addresses these questions by constructing a textual dataset based on the ICD-10-CM code descriptions, widely used in US hospitals and containing many clinical terms, and their easily reproducible rephrasing. We then benchmarked existing embedding models, either generalist or specialized in the clinical domain, in a semantic search task where the goal was to correctly match the rephrased text to the original description. Our results showed that generalist models performed better than clinical models, suggesting that existing clinical specialized models are more sensitive to small changes in input that confuse them. The highlighted problem of specialized models may be due to the fact that they have not been trained on sufficient data, and in particular on datasets that are not diverse enough to have a reliable global language understanding, which is still necessary for accurate handling of medical documents.
- [361] arXiv:2401.01951 (cross-list from cs.CV) [ pdf , other ]
-
Title: Can We Generate Realistic Hands Only Using Convolution?Comments: Contains 17 pages, 14 figures, and 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The enduring inability of image generative models to recreate intricate geometric features, such as those present in human hands and fingers has been an ongoing problem in image generation for nearly a decade. While strides have been made by increasing model sizes and diversifying training datasets, this issue remains prevalent across all models, from denoising diffusion models to Generative Adversarial Networks (GAN), pointing to a fundamental shortcoming in the underlying architectures. In this paper, we demonstrate how this problem can be mitigated by augmenting convolution layers geometric capabilities through providing them with a single input channel incorporating the relative $n$-dimensional Cartesian coordinate system. We show that this drastically improves quality of hand and face images generated by GANs and Variational AutoEncoders (VAE).
- [362] arXiv:2401.01952 (cross-list from cs.CV) [ pdf , other ]
-
Title: Instruct-Imagen: Image Generation with Multi-modal InstructionAuthors: Hexiang Hu , Kelvin C.K. Chan , Yu-Chuan Su , Wenhu Chen , Yandong Li , Kihyuk Sohn , Yang Zhao , Xue Ben , Boqing Gong , William Cohen , Ming-Wei Chang , Xuhui JiaComments: 20 pages, 18 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This paper presents instruct-imagen, a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks. We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision. It uses natural language to amalgamate disparate modalities (e.g., text, edge, style, subject, etc.), such that abundant generation intents can be standardized in a uniform format.
We then build instruct-imagen by fine-tuning a pre-trained text-to-image diffusion model with a two-stage framework. First, we adapt the model using the retrieval-augmented training, to enhance model's capabilities to ground its generation on external multimodal context. Subsequently, we fine-tune the adapted model on diverse image generation tasks that requires vision-language understanding (e.g., subject-driven generation, etc.), each paired with a multi-modal instruction encapsulating the task's essence. Human evaluation on various image generation datasets reveals that instruct-imagen matches or surpasses prior task-specific models in-domain and demonstrates promising generalization to unseen and more complex tasks. - [363] arXiv:2401.01967 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and ToxicityAuthors: Andrew Lee , Xiaoyan Bai , Itamar Pres , Martin Wattenberg , Jonathan K. Kummerfeld , Rada MihalceaSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While alignment algorithms are now commonly used to tune pre-trained language models towards a user's preferences, we lack explanations for the underlying mechanisms in which models become ``aligned'', thus making it difficult to explain phenomena like jailbreaks. In this work we study a popular algorithm, direct preference optimization (DPO), and the mechanisms by which it reduces toxicity. Namely, we first study how toxicity is represented and elicited in a pre-trained language model, GPT2-medium. We then apply DPO with a carefully crafted pairwise dataset to reduce toxicity. We examine how the resulting model averts toxic outputs, and find that capabilities learned from pre-training are not removed, but rather bypassed. We use this insight to demonstrate a simple method to un-align the model, reverting it back to its toxic behavior.
- [364] arXiv:2401.01970 (cross-list from cs.CV) [ pdf , other ]
-
Title: FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene UnderstandingComments: 19 pages, Project page coming soonSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Precisely perceiving the geometric and semantic properties of real-world 3D objects is crucial for the continued evolution of augmented reality and robotic applications. To this end, we present \algfull{} (\algname{}), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by $\mathbf{10.2}$ percent on open-vocabulary language-based object detection, despite that we are $\mathbf{851\times}$ faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. We plan to release the code upon paper acceptance.
- [365] arXiv:2401.01974 (cross-list from cs.CV) [ pdf , other ]
-
Title: Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as ProgrammersSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Visual reasoning is dominated by end-to-end neural networks scaled to billions of model parameters and training examples. However, even the largest models struggle with compositional reasoning, generalization, fine-grained spatial and temporal reasoning, and counting. Visual reasoning with large language models (LLMs) as controllers can, in principle, address these limitations by decomposing the task and solving subtasks by orchestrating a set of (visual) tools. Recently, these models achieved great performance on tasks such as compositional visual question answering, visual grounding, and video temporal reasoning. Nevertheless, in their current form, these models heavily rely on human engineering of in-context examples in the prompt, which are often dataset- and task-specific and require significant labor by highly skilled programmers. In this work, we present a framework that mitigates these issues by introducing spatially and temporally abstract routines and by leveraging a small number of labeled examples to automatically generate in-context examples, thereby avoiding human-created in-context examples. On a number of visual reasoning tasks, we show that our framework leads to consistent gains in performance, makes LLMs as controllers setup more robust, and removes the need for human engineering of in-context examples.
- [366] arXiv:2401.01989 (cross-list from cs.CL) [ pdf , other ]
-
Title: Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position BiasComments: Accepted to NAACL 2024 Main ConferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We characterize and study zero-shot abstractive summarization in Large Language Models (LLMs) by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.
- [367] arXiv:2401.01990 (cross-list from cs.CV) [ pdf , other ]
-
Title: GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised LearningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised Learning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS-SSL proposes instead to design a metric space where Euclidean distances become a meaningful proxy for semantic relationship. In that space, it is now possible to generate positive samples from nearest neighbor sampling. Any prior knowledge can now be embedded into that metric space independently from the employed DA. From its simplicity, GPS-SSL is applicable to any SSL method, e.g. SimCLR or BYOL. A key benefit of GPS-SSL is in reducing the pressure in tailoring strong DAs. For example GPS-SSL reaches 85.58% on Cifar10 with weak DA while the baseline only reaches 37.51%. We therefore move a step forward towards the goal of making SSL less reliant on DA. We also show that even when using strong DAs, GPS-SSL outperforms the baselines on under-studied domains. We evaluate GPS-SSL along with multiple baseline SSL methods on numerous downstream datasets from different domains when the models use strong or minimal data augmentations. We hope that GPS-SSL will open new avenues in studying how to inject a priori knowledge into SSL in a principled manner.
- [368] arXiv:2401.01993 (cross-list from cs.RO) [ pdf , other ]
-
Title: On Time-Indexing as Inductive Bias in Deep RL for Sequential Manipulation TasksSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: While solving complex manipulation tasks, manipulation policies often need to learn a set of diverse skills to accomplish these tasks. The set of skills is often quite multimodal - each one may have a quite distinct distribution of actions and states. Standard deep policy-learning algorithms often model policies as deep neural networks with a single output head (deterministic or stochastic). This structure requires the network to learn to switch between modes internally, which can lead to lower sample efficiency and poor performance. In this paper we explore a simple structure which is conducive to skill learning required for so many of the manipulation tasks. Specifically, we propose a policy architecture that sequentially executes different action heads for fixed durations, enabling the learning of primitive skills such as reaching and grasping. Our empirical evaluation on the Metaworld tasks reveals that this simple structure outperforms standard policy learning methods, highlighting its potential for improved skill acquisition.
- [369] arXiv:2401.02009 (cross-list from cs.CL) [ pdf , other ]
-
Title: Self-Contrast: Better Reflection Through Inconsistent Solving PerspectivesAuthors: Wenqi Zhang , Yongliang Shen , Linjuan Wu , Qiuying Peng , Jun Wang , Yueting Zhuang , Weiming LuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM's response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We find LLMs often exhibit overconfidence or high randomness when self-evaluate, offering stubborn or inconsistent feedback, which causes poor reflection. To remedy this, we advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies. Our method endows LLM with diverse perspectives to alleviate stubborn biases. Moreover, their discrepancies indicate potential errors or inherent uncertainties that LLM often overlooks. Reflecting upon these can catalyze more accurate and stable reflection. Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.
- [370] arXiv:2401.02051 (cross-list from cs.NE) [ pdf , other ]
-
Title: An Example of Evolutionary Computation + Large Language Model Beating Human: Design of Efficient Guided Local SearchAuthors: Fei Liu , Xialiang Tong , Mingxuan Yuan , Xi Lin , Fu Luo , Zhenkun Wang , Zhichao Lu , Qingfu ZhangSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: It is often very tedious for human experts to design efficient algorithms. Recently, we have proposed a novel Algorithm Evolution using Large Language Model (AEL) framework for automatic algorithm design. AEL combines the power of a large language model and the paradigm of evolutionary computation to design, combine, and modify algorithms automatically. In this paper, we use AEL to design the guide algorithm for guided local search (GLS) to solve the well-known traveling salesman problem (TSP). AEL automatically evolves elite GLS algorithms in two days, with minimal human effort and no model training. Experimental results on 1,000 TSP20-TSP100 instances and TSPLib instances show that AEL-designed GLS outperforms state-of-the-art human-designed GLS with the same iteration budget. It achieves a 0% gap on TSP20 and TSP50 and a 0.032% gap on TSP100 in 1,000 iterations. Our findings mark the emergence of a new era in automatic algorithm design.
- [371] arXiv:2401.02092 (cross-list from cs.NE) [ pdf , other ]
-
Title: k-Winners-Take-All Ensemble Neural NetworkComments: Presented in ICONIP 2021Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Ensembling is one approach that improves the performance of a neural network by combining a number of independent neural networks, usually by either averaging or summing up their individual outputs. We modify this ensembling approach by training the sub-networks concurrently instead of independently. This concurrent training of sub-networks leads them to cooperate with each other, and we refer to them as "cooperative ensemble". Meanwhile, the mixture-of-experts approach improves a neural network performance by dividing up a given dataset to its sub-networks. It then uses a gating network that assigns a specialization to each of its sub-networks called "experts". We improve on these aforementioned ways for combining a group of neural networks by using a k-Winners-Take-All (kWTA) activation function, that acts as the combination method for the outputs of each sub-network in the ensemble. We refer to this proposed model as "kWTA ensemble neural networks" (kWTA-ENN). With the kWTA activation function, the losing neurons of the sub-networks are inhibited while the winning neurons are retained. This results in sub-networks having some form of specialization but also sharing knowledge with one another. We compare our approach with the cooperative ensemble and mixture-of-experts, where we used a feed-forward neural network with one hidden layer having 100 neurons as the sub-network architecture. Our approach yields a better performance compared to the baseline models, reaching the following test accuracies on benchmark datasets: 98.34% on MNIST, 88.06% on Fashion-MNIST, 91.56% on KMNIST, and 95.97% on WDBC.
- [372] arXiv:2401.02117 (cross-list from cs.RO) [ pdf , other ]
-
Title: Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body TeleoperationComments: Project website: this https URL (Zipeng Fu and Tony Z. Zhao are project co-leads, Chelsea Finn is the advisor)Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: Imitation learning from human demonstrations has shown impressive performance in robotics. However, most results focus on table-top manipulation, lacking the mobility and dexterity necessary for generally useful tasks. In this work, we develop a system for imitating mobile manipulation tasks that are bimanual and require whole-body control. We first present Mobile ALOHA, a low-cost and whole-body teleoperation system for data collection. It augments the ALOHA system with a mobile base, and a whole-body teleoperation interface. Using data collected with Mobile ALOHA, we then perform supervised behavior cloning and find that co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex mobile manipulation tasks such as sauteing and serving a piece of shrimp, opening a two-door wall cabinet to store heavy cooking pots, calling and entering an elevator, and lightly rinsing a used pan using a kitchen faucet. Project website: this https URL
- [373] arXiv:2401.02132 (cross-list from cs.CL) [ pdf , other ]
-
Title: DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language ModelsAuthors: Wendi Cui , Jiaxin Zhang , Zhuohang Li , Lopez Damien , Kamalika Das , Bradley Malin , Sricharan KumarSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Evaluating the quality and variability of text generated by Large Language Models (LLMs) poses a significant, yet unresolved research challenge. Traditional evaluation methods, such as ROUGE and BERTScore, which measure token similarity, often fail to capture the holistic semantic equivalence. This results in a low correlation with human judgments and intuition, which is especially problematic in high-stakes applications like healthcare and finance where reliability, safety, and robust decision-making are highly critical. This work proposes DCR, an automated framework for evaluating and improving the consistency of LLM-generated texts using a divide-conquer-reasoning approach. Unlike existing LLM-based evaluators that operate at the paragraph level, our method employs a divide-and-conquer evaluator (DCE) that breaks down the paragraph-to-paragraph comparison between two generated responses into individual sentence-to-paragraph comparisons, each evaluated based on predefined criteria. To facilitate this approach, we introduce an automatic metric converter (AMC) that translates the output from DCE into an interpretable numeric score. Beyond the consistency evaluation, we further present a reason-assisted improver (RAI) that leverages the analytical reasons with explanations identified by DCE to generate new responses aimed at reducing these inconsistencies. Through comprehensive and systematic empirical analysis, we show that our approach outperforms state-of-the-art methods by a large margin (e.g., +19.3% and +24.3% on the SummEval dataset) in evaluating the consistency of LLM generation across multiple benchmarks in semantic, factual, and summarization consistency tasks. Our approach also substantially reduces nearly 90% of output inconsistencies, showing promise for effective hallucination mitigation.
- [374] arXiv:2401.02137 (cross-list from cs.CV) [ pdf , other ]
-
Title: SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Multimodal alignment between language and vision is the fundamental topic in current vision-language model research. Contrastive Captioners (CoCa), as a representative method, integrates Contrastive Language-Image Pretraining (CLIP) and Image Caption (IC) into a unified framework, resulting in impressive results. CLIP imposes a bidirectional constraints on global representation of entire images and sentences. Although IC conducts an unidirectional image-to-text generation on local representation, it lacks any constraint on local text-to-image reconstruction, which limits the ability to understand images at a fine-grained level when aligned with texts. To achieve multimodal alignment from both global and local perspectives, this paper proposes Symmetrizing Contrastive Captioners (SyCoCa), which introduces bidirectional interactions on images and texts across the global and local representation levels. Specifically, we expand a Text-Guided Masked Image Modeling (TG-MIM) head based on ITC and IC heads. The improved SyCoCa can further leverage textual cues to reconstruct contextual images and visual cues to predict textual contents. When implementing bidirectional local interactions, the local contents of images tend to be cluttered or unrelated to their textual descriptions. Thus, we employ an attentive masking strategy to select effective image patches for interaction. Extensive experiments on five vision-language tasks, including image-text retrieval, image-captioning, visual question answering, and zero-shot/finetuned image classification, validate the effectiveness of our proposed method.
- [375] arXiv:2401.02143 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Neural Networks for Tabular Data Learning: A Survey with Taxonomy and DirectionsComments: Under review, ongoing work, Github page: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Social and Information Networks (cs.SI)
Abstract: In this survey, we dive into Tabular Data Learning (TDL) using Graph Neural Networks (GNNs), a domain where deep learning-based approaches have increasingly shown superior performance in both classification and regression tasks compared to traditional methods. The survey highlights a critical gap in deep neural TDL methods: the underrepresentation of latent correlations among data instances and feature values. GNNs, with their innate capability to model intricate relationships and interactions between diverse elements of tabular data, have garnered significant interest and application across various TDL domains. Our survey provides a systematic review of the methods involved in designing and implementing GNNs for TDL (GNN4TDL). It encompasses a detailed investigation into the foundational aspects and an overview of GNN-based TDL methods, offering insights into their evolving landscape. We present a comprehensive taxonomy focused on constructing graph structures and representation learning within GNN-based TDL methods. In addition, the survey examines various training plans, emphasizing the integration of auxiliary tasks to enhance the effectiveness of instance representations. A critical part of our discussion is dedicated to the practical application of GNNs across a spectrum of GNN4TDL scenarios, demonstrating their versatility and impact. Lastly, we discuss the limitations and propose future research directions, aiming to spur advancements in GNN4TDL. This survey serves as a resource for researchers and practitioners, offering a thorough understanding of GNNs' role in revolutionizing TDL and pointing towards future innovations in this promising area.
- [376] arXiv:2401.02153 (cross-list from cs.SE) [ pdf , other ]
-
Title: Unit Testing in ASP Revisited: Language and Test-Driven Development EnvironmentSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Unit testing frameworks are nowadays considered a best practice, included in almost all modern software development processes, to achieve rapid development of correct specifications. Knowledge representation and reasoning paradigms such as Answer Set Programming (ASP), that have been used in industry-level applications, are not an exception. Indeed, the first unit testing specification language for ASP was proposed in 2011 as a feature of the ASPIDE development environment. Later, a more portable unit testing language was included in the LANA annotation language. In this paper we revisit both languages and tools for unit testing in ASP. We propose a new unit test specification language that allows one to inline tests within ASP programs, and we identify the computational complexity of the tasks associated with checking the various program-correctness assertions. Test-case specifications are transparent to the traditional evaluation, but can be interpreted by a specific testing tool. Thus, we present a novel environment supporting test driven development of ASP programs.
- [377] arXiv:2401.02154 (cross-list from cs.LG) [ pdf , other ]
-
Title: Disentangle Estimation of Causal Effects from Cross-Silo DataAuthors: Yuxuan Liu , Haozhao Wang , Shuang Wang , Zhiming He , Wenchao Xu , Jialiang Zhu , Fan YangComments: Accepted by ICASSP 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Methodology (stat.ME)
Abstract: Estimating causal effects among different events is of great importance to critical fields such as drug development. Nevertheless, the data features associated with events may be distributed across various silos and remain private within respective parties, impeding direct information exchange between them. This, in turn, can result in biased estimations of local causal effects, which rely on the characteristics of only a subset of the covariates. To tackle this challenge, we introduce an innovative disentangle architecture designed to facilitate the seamless cross-silo transmission of model parameters, enriched with causal mechanisms, through a combination of shared and private branches. Besides, we introduce global constraints into the equation to effectively mitigate bias within the various missing domains, thereby elevating the accuracy of our causal effect estimation. Extensive experiments conducted on new semi-synthetic datasets show that our method outperforms state-of-the-art baselines.
- [378] arXiv:2401.02158 (cross-list from cs.CL) [ pdf , other ]
-
Title: Shayona@SMM4H23: COVID-19 Self diagnosis classification using BERT and LightGBM modelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper describes approaches and results for shared Task 1 and 4 of SMMH4-23 by Team Shayona. Shared Task-1 was binary classification of english tweets self-reporting a COVID-19 diagnosis, and Shared Task-4 was Binary classification of English Reddit posts self-reporting a social anxiety disorder diagnosis. Our team has achieved the highest f1-score 0.94 in Task-1 among all participants. We have leveraged the Transformer model (BERT) in combination with the LightGBM model for both tasks.
- [379] arXiv:2401.02173 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Prompt Decoupling for Text-to-Image Person Re-identificationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Text-to-image person re-identification (TIReID) aims to retrieve the target person from an image gallery via a textual description query. Recently, pre-trained vision-language models like CLIP have attracted significant attention and have been widely utilized for this task due to their robust capacity for semantic concept learning and rich multi-modal knowledge. However, recent CLIP-based TIReID methods commonly rely on direct fine-tuning of the entire network to adapt the CLIP model for the TIReID task. Although these methods show competitive performance on this topic, they are suboptimal as they necessitate simultaneous domain adaptation and task adaptation. To address this issue, we attempt to decouple these two processes during the training stage. Specifically, we introduce the prompt tuning strategy to enable domain adaptation and propose a two-stage training approach to disentangle domain adaptation from task adaptation. In the first stage, we freeze the two encoders from CLIP and solely focus on optimizing the prompts to alleviate domain gap between the original training data of CLIP and downstream tasks. In the second stage, we maintain the fixed prompts and fine-tune the CLIP model to prioritize capturing fine-grained information, which is more suitable for TIReID task. Finally, we evaluate the effectiveness of our method on three widely used datasets. Compared to the directly fine-tuned approach, our method achieves significant improvements.
- [380] arXiv:2401.02183 (cross-list from cs.LG) [ pdf , other ]
-
Title: FairGridSearch: A Framework to Compare Fairness-Enhancing ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Machine learning models are increasingly used in critical decision-making applications. However, these models are susceptible to replicating or even amplifying bias present in real-world data. While there are various bias mitigation methods and base estimators in the literature, selecting the optimal model for a specific application remains challenging.
This paper focuses on binary classification and proposes FairGridSearch, a novel framework for comparing fairness-enhancing models. FairGridSearch enables experimentation with different model parameter combinations and recommends the best one. The study applies FairGridSearch to three popular datasets (Adult, COMPAS, and German Credit) and analyzes the impacts of metric selection, base estimator choice, and classification threshold on model fairness.
The results highlight the significance of selecting appropriate accuracy and fairness metrics for model evaluation. Additionally, different base estimators and classification threshold values affect the effectiveness of bias mitigation methods and fairness stability respectively, but the effects are not consistent across all datasets. Based on these findings, future research on fairness in machine learning should consider a broader range of factors when building fair models, going beyond bias mitigation methods alone. - [381] arXiv:2401.02199 (cross-list from eess.SY) [ pdf , other ]
-
Title: LADRI: LeArning-based Dynamic Risk Indicator in Automated Driving SystemComments: 2023 IEEE International Test Conference, 8th Edition of Automotive, Reliability, Test & Safety Workshop in Disneyland, Anaheim, CASubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Software Engineering (cs.SE)
Abstract: As the horizon of intelligent transportation expands with the evolution of Automated Driving Systems (ADS), ensuring paramount safety becomes more imperative than ever. Traditional risk assessment methodologies, primarily crafted for human-driven vehicles, grapple to adequately adapt to the multifaceted, evolving environments of ADS. This paper introduces a framework for real-time Dynamic Risk Assessment (DRA) in ADS, harnessing the potency of Artificial Neural Networks (ANNs).
Our proposed solution transcends these limitations, drawing upon ANNs, a cornerstone of deep learning, to meticulously analyze and categorize risk dimensions using real-time On-board Sensor (OBS) data. This learning-centric approach not only elevates the ADS's situational awareness but also enriches its understanding of immediate operational contexts. By dissecting OBS data, the system is empowered to pinpoint its current risk profile, thereby enhancing safety prospects for onboard passengers and the broader traffic ecosystem.
Through this framework, we chart a direction in risk assessment, bridging the conventional voids and enhancing the proficiency of ADS. By utilizing ANNs, our methodology offers a perspective, allowing ADS to adeptly navigate and react to potential risk factors, ensuring safer and more informed autonomous journeys. - [382] arXiv:2401.02212 (cross-list from cs.CL) [ pdf , other ]
-
Title: Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge GraphSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope. Existing temporal knowledge graph question answering (TKGQA) models solely approach simple questions, owing to the prior assumption that each question only contains a single temporal fact with explicit/implicit temporal constraints. Hence, they perform poorly on questions which own multiple temporal facts. In this paper, we propose \textbf{\underline{J}}oint \textbf{\underline{M}}ulti \textbf{\underline{F}}acts \textbf{\underline{R}}easoning \textbf{\underline{N}}etwork (JMFRN), to jointly reasoning multiple temporal facts for accurately answering \emph{complex} temporal questions. Specifically, JMFRN first retrieves question-related temporal facts from TKG for each entity of the given complex question. For joint reasoning, we design two different attention (\ie entity-aware and time-aware) modules, which are suitable for universal settings, to aggregate entities and timestamps information of retrieved facts. Moreover, to filter incorrect type answers, we introduce an additional answer type discrimination task. Extensive experiments demonstrate our proposed method significantly outperforms the state-of-art on the well-known complex temporal question benchmark TimeQuestions.
- [383] arXiv:2401.02244 (cross-list from cs.LG) [ pdf , other ]
-
Title: Policy-regularized Offline Multi-objective Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.
- [384] arXiv:2401.02258 (cross-list from cs.LG) [ pdf , other ]
-
Title: Uncertainty-Aware Deep Attention Recurrent Neural Network for Heterogeneous Time Series ImputationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Missingness is ubiquitous in multivariate time series and poses an obstacle to reliable downstream analysis. Although recurrent network imputation achieved the SOTA, existing models do not scale to deep architectures that can potentially alleviate issues arising in complex data. Moreover, imputation carries the risk of biased estimations of the ground truth. Yet, confidence in the imputed values is always unmeasured or computed post hoc from model output. We propose DEep Attention Recurrent Imputation (DEARI), which jointly estimates missing values and their associated uncertainty in heterogeneous multivariate time series. By jointly representing feature-wise correlations and temporal dynamics, we adopt a self attention mechanism, along with an effective residual component, to achieve a deep recurrent neural network with good imputation performance and stable convergence. We also leverage self-supervised metric learning to boost performance by optimizing sample similarity. Finally, we transform DEARI into a Bayesian neural network through a novel Bayesian marginalization strategy to produce stochastic DEARI, which outperforms its deterministic equivalent. Experiments show that DEARI surpasses the SOTA in diverse imputation tasks using real-world datasets, namely air quality control, healthcare and traffic.
- [385] arXiv:2401.02290 (cross-list from cs.LG) [ pdf , other ]
-
Title: Path-based Explanation for Knowledge Graph CompletionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Graph Neural Networks (GNNs) have achieved great success in Knowledge Graph Completion (KGC) by modelling how entities and relations interact in recent years. However, the explanation of the predicted facts has not caught the necessary attention. Proper explanations for the results of GNN-based KGC models increase model transparency and help researchers develop more reliable models. Existing practices for explaining KGC tasks rely on instance/subgraph-based approaches, while in some scenarios, paths can provide more user-friendly and interpretable explanations. Nonetheless, the methods for generating path-based explanations for KGs have not been well-explored. To address this gap, we propose Power-Link, the first path-based KGC explainer that explores GNN-based models. We design a novel simplified graph-powering technique, which enables the generation of path-based explanations with a fully parallelisable and memory-efficient training scheme. We further introduce three new metrics for quantitative evaluation of the explanations, together with a qualitative human evaluation. Extensive experiments demonstrate that Power-Link outperforms the SOTA baselines in interpretability, efficiency, and scalability.
- [386] arXiv:2401.02347 (cross-list from cs.CV) [ pdf , other ]
-
Title: Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only TrainingComments: AAAI 2024.Open sourced, Code and Model AvailableSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Image captioning aims at generating descriptive and meaningful textual descriptions of images, enabling a broad range of vision-language applications. Prior works have demonstrated that harnessing the power of Contrastive Image Language Pre-training (CLIP) offers a promising approach to achieving zero-shot captioning, eliminating the need for expensive caption annotations. However, the widely observed modality gap in the latent space of CLIP harms the performance of zero-shot captioning by breaking the alignment between paired image-text features. To address this issue, we conduct an analysis on the CLIP latent space which leads to two findings. Firstly, we observe that the CLIP's visual feature of image subregions can achieve closer proximity to the paired caption due to the inherent information loss in text descriptions. In addition, we show that the modality gap between a paired image-text can be empirically modeled as a zero-mean Gaussian distribution. Motivated by the findings, we propose a novel zero-shot image captioning framework with text-only training to reduce the modality gap. In particular, we introduce a subregion feature aggregation to leverage local region information, which produces a compact visual representation for matching text representation. Moreover, we incorporate a noise injection and CLIP reranking strategy to boost captioning performance. We also extend our framework to build a zero-shot VQA pipeline, demonstrating its generality. Through extensive experiments on common captioning and VQA datasets such as MSCOCO, Flickr30k and VQAV2, we show that our method achieves remarkable performance improvements. Code is available at this https URL .
- [387] arXiv:2401.02349 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Survey Analyzing Generalization in Deep Reinforcement LearningAuthors: Ezgi KorkmazSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Reinforcement learning research obtained significant success and attention with the utilization of deep neural networks to solve problems in high dimensional state or action spaces. While deep reinforcement learning policies are currently being deployed in many different fields from medical applications to self driving vehicles, there are still ongoing questions the field is trying to answer on the generalization capabilities of deep reinforcement learning policies. In this paper, we will outline the fundamental reasons why deep reinforcement learning policies encounter overfitting problems that limit their robustness and generalization capabilities. Furthermore, we will formalize and unify the diverse solution approaches to increase generalization, and overcome overfitting in state-action value functions. We believe our study can provide a compact systematic unified analysis for the current advancements in deep reinforcement learning, and help to construct robust deep neural policies with improved generalization abilities.
- [388] arXiv:2401.02383 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Survey of 3D Human Body Pose and Shape Estimation Methods for Contemporary Dance ApplicationsComments: arXiv admin note: text overlap with arXiv:2008.09062 by other authorsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: 3D human body shape and pose estimation from RGB images is a challenging problem with potential applications in augmented/virtual reality, healthcare and fitness technology and virtual retail. Recent solutions have focused on three types of inputs: i) single images, ii) multi-view images and iii) videos. In this study, we surveyed and compared 3D body shape and pose estimation methods for contemporary dance and performing arts, with a special focus on human body pose and dressing, camera viewpoint, illumination conditions and background conditions. We demonstrated that multi-frame methods, such as PHALP, provide better results than single-frame method for pose estimation when dancers are performing contemporary dances.
- [389] arXiv:2401.02385 (cross-list from cs.CL) [ pdf , other ]
-
Title: TinyLlama: An Open-Source Small Language ModelComments: Technical ReportSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We present TinyLlama, a compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs. Building on the architecture and tokenizer of Llama 2, TinyLlama leverages various advances contributed by the open-source community (e.g., FlashAttention), achieving better computational efficiency. Despite its relatively small size, TinyLlama demonstrates remarkable performance in a series of downstream tasks. It significantly outperforms existing open-source language models with comparable sizes. Our model checkpoints and code are publicly available on GitHub at this https URL .
- [390] arXiv:2401.02403 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Real-Time 2D Temperature Field Prediction in Metal Additive Manufacturing Using Physics-Informed Neural NetworksComments: 42 pages, 13 FiguresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Accurately predicting the temperature field in metal additive manufacturing (AM) processes is critical to preventing overheating, adjusting process parameters, and ensuring process stability. While physics-based computational models offer precision, they are often time-consuming and unsuitable for real-time predictions and online control in iterative design scenarios. Conversely, machine learning models rely heavily on high-quality datasets, which can be costly and challenging to obtain within the metal AM domain. Our work addresses this by introducing a physics-informed neural network framework specifically designed for temperature field prediction in metal AM. This framework incorporates a physics-informed input, physics-informed loss function, and a Convolutional Long Short-Term Memory (ConvLSTM) architecture. Utilizing real-time temperature data from the process, our model predicts 2D temperature fields for future timestamps across diverse geometries, deposition patterns, and process parameters. We validate the proposed framework in two scenarios: full-field temperature prediction for a thin wall and 2D temperature field prediction for cylinder and cubic parts, demonstrating errors below 3% and 1%, respectively. Our proposed framework exhibits the flexibility to be applied across diverse scenarios with varying process parameters, geometries, and deposition patterns.
- [391] arXiv:2401.02411 (cross-list from cs.CV) [ pdf , other ]
-
Title: What You See is What You GAN: Rendering Every Pixel for High-Fidelity Geometry in 3D GANsAuthors: Alex Trevithick , Matthew Chan , Towaki Takikawa , Umar Iqbal , Shalini De Mello , Manmohan Chandraker , Ravi Ramamoorthi , Koki NaganoComments: See our project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Abstract: 3D-aware Generative Adversarial Networks (GANs) have shown remarkable progress in learning to generate multi-view-consistent images and 3D geometries of scenes from collections of 2D images via neural volume rendering. Yet, the significant memory and computational costs of dense sampling in volume rendering have forced 3D GANs to adopt patch-based training or employ low-resolution rendering with post-processing 2D super resolution, which sacrifices multiview consistency and the quality of resolved geometry. Consequently, 3D GANs have not yet been able to fully resolve the rich 3D geometry present in 2D images. In this work, we propose techniques to scale neural volume rendering to the much higher resolution of native 2D images, thereby resolving fine-grained 3D geometry with unprecedented detail. Our approach employs learning-based samplers for accelerating neural rendering for 3D GAN training using up to 5 times fewer depth samples. This enables us to explicitly "render every pixel" of the full-resolution image during training and inference without post-processing superresolution in 2D. Together with our strategy to learn high-quality surface geometry, our method synthesizes high-resolution 3D geometry and strictly view-consistent images while maintaining image quality on par with baselines relying on post-processing super resolution. We demonstrate state-of-the-art 3D gemetric quality on FFHQ and AFHQ, setting a new standard for unsupervised learning of 3D shapes in 3D GANs.
- [392] arXiv:2401.02412 (cross-list from cs.LG) [ pdf , other ]
-
Title: LLM Augmented LLMs: Expanding Capabilities through CompositionAuthors: Rachit Bansal , Bidisha Samanta , Siddharth Dalmia , Nitish Gupta , Shikhar Vashishth , Sriram Ganapathy , Abhishek Bapna , Prateek Jain , Partha TalukdarComments: 17 pages, 2 figures, 8 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts.
- [393] arXiv:2401.02416 (cross-list from cs.CV) [ pdf , other ]
-
Title: ODIN: A Single Model for 2D and 3D PerceptionAuthors: Ayush Jain , Pushkal Katara , Nikolaos Gkanatsios , Adam W. Harley , Gabriel Sarch , Kriti Aggarwal , Vishrav Chaudhary , Katerina FragkiadakiSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: State-of-the-art models on contemporary 3D perception benchmarks like ScanNet consume and label dataset-provided 3D point clouds, obtained through post processing of sensed multiview RGB-D images. They are typically trained in-domain, forego large-scale 2D pre-training and outperform alternatives that featurize the posed RGB-D multiview images instead. The gap in performance between methods that consume posed images versus post-processed 3D point clouds has fueled the belief that 2D and 3D perception require distinct model architectures. In this paper, we challenge this view and propose ODIN (Omni-Dimensional INstance segmentation), a model that can segment and label both 2D RGB images and 3D point clouds, using a transformer architecture that alternates between 2D within-view and 3D cross-view information fusion. Our model differentiates 2D and 3D feature operations through the positional encodings of the tokens involved, which capture pixel coordinates for 2D patch tokens and 3D coordinates for 3D feature tokens. ODIN achieves state-of-the-art performance on ScanNet200, Matterport3D and AI2THOR 3D instance segmentation benchmarks, and competitive performance on ScanNet, S3DIS and COCO. It outperforms all previous works by a wide margin when the sensed 3D point cloud is used in place of the point cloud sampled from 3D mesh. When used as the 3D perception engine in an instructable embodied agent architecture, it sets a new state-of-the-art on the TEACh action-from-dialogue benchmark. Our code and checkpoints can be found at the project website: this https URL .
- [394] arXiv:2401.02421 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Neuronal Auditory Machine Intelligence (NEURO-AMI) In PerspectiveAuthors: Emmanuel Ndidi OsegiComments: Journal submission in progressSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: The recent developments in soft computing cannot be complete without noting the contributions of artificial neural machine learning systems that draw inspiration from real cortical tissue or processes that occur in human brain. The universal approximability of such neural systems has led to its wide spread use, and novel developments in this evolving technology has shown that there is a bright future for such Artificial Intelligent (AI) techniques in the soft computing field. Indeed, the proliferation of large and very deep networks of artificial neural systems and the corresponding enhancement and development of neural machine learning algorithms have contributed immensely to the development of the modern field of Deep Learning as may be found in the well documented research works of Lecun, Bengio and Hinton. However, the key requirements of end user affordability in addition to reduced complexity and reduced data learning size requirement means there still remains a need for the synthesis of more cost-efficient and less data-hungry artificial neural systems. In this report, we present an overview of a new competing bio-inspired continual learning neural tool Neuronal Auditory Machine Intelligence (Neuro-AMI) as a predictor detailing its functional and structural details, important aspects on right applicability, some recent application use cases and future research directions for current and prospective machine learning experts and data scientists.
- [395] arXiv:2401.02424 (cross-list from cs.CV) [ pdf , other ]
-
Title: Mapping of Land Use and Land Cover (LULC) using EuroSAT and Transfer LearningComments: 10 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As the global population continues to expand, the demand for natural resources increases. Unfortunately, human activities account for 23% of greenhouse gas emissions. On a positive note, remote sensing technologies have emerged as a valuable tool in managing our environment. These technologies allow us to monitor land use, plan urban areas, and drive advancements in areas such as agriculture, climate change mitigation, disaster recovery, and environmental monitoring. Recent advances in AI, computer vision, and earth observation data have enabled unprecedented accuracy in land use mapping. By using transfer learning and fine-tuning with RGB bands, we achieved an impressive 99.19% accuracy in land use analysis. Such findings can be used to inform conservation and urban planning policies.
- [396] arXiv:2401.02425 (cross-list from cs.NI) [ pdf , ps , other ]
-
Title: UAV Trajectory Planning for AoI-Minimal Data Collection in UAV-Aided IoT Networks by TransformerJournal-ref: IEEE TWC, 2023Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: Maintaining freshness of data collection in Internet-of-Things (IoT) networks has attracted increasing attention. By taking into account age-of-information (AoI), we investigate the trajectory planning problem of an unmanned aerial vehicle (UAV) that is used to aid a cluster-based IoT network. An optimization problem is formulated to minimize the total AoI of the collected data by the UAV from the ground IoT network. Since the total AoI of the IoT network depends on the flight time of the UAV and the data collection time at hovering points, we jointly optimize the selection of hovering points and the visiting order to these points. We exploit the state-of-the-art transformer and the weighted A*, which is a path search algorithm, to design a machine learning algorithm to solve the formulated problem. The whole UAV-IoT system is fed into the encoder network of the proposed algorithm, and the algorithm's decoder network outputs the visiting order to ground clusters. Then, the weighted A* is used to find the hovering point for each cluster in the ground IoT network. Simulation results show that the trained model by the proposed algorithm has a good generalization ability to generate solutions for IoT networks with different numbers of ground clusters, without the need to retrain the model. Furthermore, results show that our proposed algorithm can find better UAV trajectories with the minimum total AoI when compared to other algorithms.
- [397] arXiv:2401.02427 (cross-list from cs.NI) [ pdf , other ]
-
Title: 5G Positioning Advancements with AI/MLSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: This paper provides a comprehensive review of AI/ML-based direct positioning within 5G systems, focusing on its potential in challenging scenarios and conditions where conventional methods often fall short. Building upon the insights from the technical report TR38.843, we examine the Life Cycle Management (LCM) with a focus on to the aspects associated direct positioning process. We highlight significant simulation results and key observations from the report on the direct positioning under the various challenging conditions. Additionally, we discuss selected solutions that address measurement reporting, data collection, and model management, emphasizing their importance for advancing direct positioning.
- [398] arXiv:2401.02429 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Brain-Inspired Spiking Neural Networks for Industrial Fault Diagnosis: A Survey, Challenges, and OpportunitiesComments: Under ReviewSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In recent decades, Industrial Fault Diagnosis (IFD) has emerged as a crucial discipline concerned with detecting and gathering vital information about industrial equipment's health condition, thereby facilitating the identification of failure types and severities. The pursuit of precise and effective fault recognition has garnered substantial attention, culminating in a focus on automating equipment monitoring to preclude safety accidents and reduce reliance on human labor. The advent of artificial neural networks (ANNs) has been instrumental in augmenting intelligent IFD algorithms, particularly in the context of big data. Despite these advancements, ANNs, being a simplified biomimetic neural network model, exhibit inherent limitations such as resource and data dependencies and restricted cognitive capabilities. To address these limitations, the third-generation Spiking Neural Network (SNN), founded on principles of Brain-inspired computing, has surfaced as a promising alternative. The SNN, characterized by its biological neuron dynamics and spiking information encoding, demonstrates exceptional potential in representing spatiotemporal features. Consequently, developing SNN-based IFD models has gained momentum, displaying encouraging performance. Nevertheless, this field lacks systematic surveys to illustrate the current situation, challenges, and future directions. Therefore, this paper systematically reviews the theoretical progress of SNN-based models to answer the question of what SNN is. Subsequently, it reviews and analyzes existing SNN-based IFD models to explain why SNN needs to be used and how to use it. More importantly, this paper systematically answers the challenges, solutions, and opportunities of SNN in IFD.
- [399] arXiv:2401.02430 (cross-list from cs.CV) [ pdf , other ]
-
Title: Automated Classification of Model Errors on ImageNetComments: NeurIPS 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: While the ImageNet dataset has been driving computer vision research over the past decade, significant label noise and ambiguity have made top-1 accuracy an insufficient measure of further progress. To address this, new label-sets and evaluation protocols have been proposed for ImageNet showing that state-of-the-art models already achieve over 95% accuracy and shifting the focus on investigating why the remaining errors persist.
Recent work in this direction employed a panel of experts to manually categorize all remaining classification errors for two selected models. However, this process is time-consuming, prone to inconsistencies, and requires trained experts, making it unsuitable for regular model evaluation thus limiting its utility. To overcome these limitations, we propose the first automated error classification framework, a valuable tool to study how modeling choices affect error distributions. We use our framework to comprehensively evaluate the error distribution of over 900 models. Perhaps surprisingly, we find that across model architectures, scales, and pre-training corpora, top-1 accuracy is a strong predictor for the portion of all error types. In particular, we observe that the portion of severe errors drops significantly with top-1 accuracy indicating that, while it underreports a model's true performance, it remains a valuable performance metric.
We release all our code at this https URL . - [400] arXiv:2401.02433 (cross-list from cs.CV) [ pdf , other ]
-
Title: FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-ClientsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the diffusion process is driven by a single modal in a single computing node. To facilitate the secure fusion of heterogeneous data from clients, it is necessary to enable distributed multi-modal control, such as merging the hyperspectral data of organization A and the LiDAR data of organization B privately on each base station client. In this study, we propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Our key insight is that diffusion models driven by different modalities are inherently complementary in terms of potential denoising steps on which bilateral connections can be built. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure, and introduce a lightweight communication module. Qualitative and quantitative experiments validate the superiority of our framework in terms of image quality and conditional consistency.
- [401] arXiv:2401.02452 (cross-list from cs.CY) [ pdf , other ]
-
Title: The Compute Divide in Machine Learning: A Threat to Academic Contribution and Scrutiny?Authors: Tamay Besiroglu , Sage Andrus Bergerson , Amelia Michael , Lennart Heim , Xueyun Luo , Neil ThompsonSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: There are pronounced differences in the extent to which industrial and academic AI labs use computing resources. We provide a data-driven survey of the role of the compute divide in shaping machine learning research. We show that a compute divide has coincided with a reduced representation of academic-only research teams in compute intensive research topics, especially foundation models. We argue that, academia will likely play a smaller role in advancing the associated techniques, providing critical evaluation and scrutiny, and in the diffusion of such models. Concurrent with this change in research focus, there is a noticeable shift in academic research towards embracing open source, pre-trained models developed within the industry. To address the challenges arising from this trend, especially reduced scrutiny of influential models, we recommend approaches aimed at thoughtfully expanding academic insights. Nationally-sponsored computing infrastructure coupled with open science initiatives could judiciously boost academic compute access, prioritizing research on interpretability, safety and security. Structured access programs and third-party auditing may also allow measured external evaluation of industry systems.
- [402] arXiv:2401.02456 (cross-list from cs.LG) [ pdf , other ]
-
Title: A comprehensive survey of research towards AI-enabled unmanned aerial systems in pre-, active-, and post-wildfire managementAuthors: Sayed Pedram Haeri Boroujeni , Abolfazl Razi , Sahand Khoshdel , Fatemeh Afghah , Janice L. Coen , Leo ONeill , Peter Z. Fule , Adam Watts , Nick-Marios T. Kokolakis , Kyriakos G. VamvoudakisSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Wildfires have emerged as one of the most destructive natural disasters worldwide, causing catastrophic losses in both human lives and forest wildlife. Recently, the use of Artificial Intelligence (AI) in wildfires, propelled by the integration of Unmanned Aerial Vehicles (UAVs) and deep learning models, has created an unprecedented momentum to implement and develop more effective wildfire management. Although some of the existing survey papers have explored various learning-based approaches, a comprehensive review emphasizing the application of AI-enabled UAV systems and their subsequent impact on multi-stage wildfire management is notably lacking. This survey aims to bridge these gaps by offering a systematic review of the recent state-of-the-art technologies, highlighting the advancements of UAV systems and AI models from pre-fire, through the active-fire stage, to post-fire management. To this aim, we provide an extensive analysis of the existing remote sensing systems with a particular focus on the UAV advancements, device specifications, and sensor technologies relevant to wildfire management. We also examine the pre-fire and post-fire management approaches, including fuel monitoring, prevention strategies, as well as evacuation planning, damage assessment, and operation strategies. Additionally, we review and summarize a wide range of computer vision techniques in active-fire management, with an emphasis on Machine Learning (ML), Reinforcement Learning (RL), and Deep Learning (DL) algorithms for wildfire classification, segmentation, detection, and monitoring tasks. Ultimately, we underscore the substantial advancement in wildfire modeling through the integration of cutting-edge AI techniques and UAV-based data, providing novel insights and enhanced predictive capabilities to understand dynamic wildfire behavior.
- [403] arXiv:2401.02457 (cross-list from cs.LG) [ pdf , other ]
-
Title: eCIL-MU: Embedding based Class Incremental Learning and Machine UnlearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: New categories may be introduced over time, or existing categories may need to be reclassified. Class incremental learning (CIL) is employed for the gradual acquisition of knowledge about new categories while preserving information about previously learned ones in such dynamic environments. It might also be necessary to also eliminate the influence of related categories on the model to adapt to reclassification. We thus introduce class-level machine unlearning (MU) within CIL. Typically, MU methods tend to be time-consuming and can potentially harm the model's performance. A continuous stream of unlearning requests could lead to catastrophic forgetting. To address these issues, we propose a non-destructive eCIL-MU framework based on embedding techniques to map data into vectors and then be stored in vector databases. Our approach exploits the overlap between CIL and MU tasks for acceleration. Experiments demonstrate the capability of achieving unlearning effectiveness and orders of magnitude (upto $\sim 278\times$) of acceleration.
- [404] arXiv:2401.02458 (cross-list from cs.LG) [ pdf , other ]
-
Title: Data-Centric Foundation Models in Computational Healthcare: A SurveyAuthors: Yunkun Zhang , Jin Gao , Zheling Tan , Lingfeng Zhou , Kexin Ding , Mu Zhou , Shaoting Zhang , Dequan WangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The advent of foundation models (FMs) as an emerging suite of AI techniques has struck a wave of opportunities in computational healthcare. The interactive nature of these models, guided by pre-training data and human instructions, has ignited a data-centric AI paradigm that emphasizes better data characterization, quality, and scale. In healthcare AI, obtaining and processing high-quality clinical data records has been a longstanding challenge, ranging from data quantity, annotation, patient privacy, and ethics. In this survey, we investigate a wide range of data-centric approaches in the FM era (from model pre-training to inference) towards improving the healthcare workflow. We discuss key perspectives in AI security, assessment, and alignment with human values. Finally, we offer a promising outlook of FM-based analytics to enhance the performance of patient outcome and clinical workflow in the evolving landscape of healthcare and medicine. We provide an up-to-date list of healthcare-related foundation models and datasets at this https URL .
- [405] arXiv:2401.02465 (cross-list from cs.LG) [ pdf , other ]
-
Title: Interpretable Time Series Models for Wastewater Modeling in Combined Sewer OverflowsComments: 8 pages, 5 figures, 2 tables, presented at iSCSi 2023 LisbonSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Climate change poses increasingly complex challenges to our society. Extreme weather events such as floods, wild fires or droughts are becoming more frequent, spontaneous and difficult to foresee or counteract. In this work we specifically address the problem of sewage water polluting surface water bodies after spilling over from rain tanks as a consequence of heavy rain events. We investigate to what extent state-of-the-art interpretable time series models can help predict such critical water level points, so that the excess can promptly be redistributed across the sewage network. Our results indicate that modern time series models can contribute to better waste water management and prevention of environmental pollution from sewer systems. All the code and experiments can be found in our repository: this https URL .
- [406] arXiv:2401.02511 (cross-list from eess.SY) [ pdf , other ]
-
Title: Gain Scheduling with a Neural Operator for a Transport PDE with Nonlinear RecirculationComments: 16 pages, 5 figuresSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Abstract: To stabilize PDE models, control laws require space-dependent functional gains mapped by nonlinear operators from the PDE functional coefficients. When a PDE is nonlinear and its "pseudo-coefficient" functions are state-dependent, a gain-scheduling (GS) nonlinear design is the simplest approach to the design of nonlinear feedback. The GS version of PDE backstepping employs gains obtained by solving a PDE at each value of the state. Performing such PDE computations in real time may be prohibitive. The recently introduced neural operators (NO) can be trained to produce the gain functions, rapidly in real time, for each state value, without requiring a PDE solution. In this paper we introduce NOs for GS-PDE backstepping. GS controllers act on the premise that the state change is slow and, as a result, guarantee only local stability, even for ODEs. We establish local stabilization of hyperbolic PDEs with nonlinear recirculation using both a "full-kernel" approach and the "gain-only" approach to gain operator approximation. Numerical simulations illustrate stabilization and demonstrate speedup by three orders of magnitude over traditional PDE gain-scheduling. Code (Github) for the numerical implementation is published to enable exploration.
- [407] arXiv:2401.02516 (cross-list from eess.SY) [ pdf , other ]
-
Title: Moving-Horizon Estimators for Hyperbolic and Parabolic PDEs in 1-DComments: 7 pages, 1 figure, submitted to ACC 2024Subjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Abstract: Observers for PDEs are themselves PDEs. Therefore, producing real time estimates with such observers is computationally burdensome. For both finite-dimensional and ODE systems, moving-horizon estimators (MHE) are operators whose output is the state estimate, while their inputs are the initial state estimate at the beginning of the horizon as well as the measured output and input signals over the moving time horizon. In this paper we introduce MHEs for PDEs which remove the need for a numerical solution of an observer PDE in real time. We accomplish this using the PDE backstepping method which, for certain classes of both hyperbolic and parabolic PDEs, produces moving-horizon state estimates explicitly. Precisely, to explicitly produce the state estimates, we employ a backstepping transformation of a hard-to-solve observer PDE into a target observer PDE, which is explicitly solvable. The MHEs we propose are not new observer designs but simply the explicit MHE realizations, over a moving horizon of arbitrary length, of the existing backstepping observers. Our PDE MHEs lack the optimality of the MHEs that arose as duals of MPC, but they are given explicitly, even for PDEs. In the paper we provide explicit formulae for MHEs for both hyperbolic and parabolic PDEs, as well as simulation results that illustrate theoretically guaranteed convergence of the MHEs.
- [408] arXiv:2401.02523 (cross-list from cs.CV) [ pdf , other ]
-
Title: Image-based Deep Learning for Smart Digital Twins: a ReviewAuthors: Md Ruman Islam , Mahadevan Subramaniam , Pei-Chi Huang (Department of Computer Science, University of Nebraska at Omaha, Omaha, NE, USA)Comments: 12 pages, 2 figures, and 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: Smart Digital twins (SDTs) are being increasingly used to virtually replicate and predict the behaviors of complex physical systems through continual data assimilation enabling the optimization of the performance of these systems by controlling the actions of systems. Recently, deep learning (DL) models have significantly enhanced the capabilities of SDTs, particularly for tasks such as predictive maintenance, anomaly detection, and optimization. In many domains, including medicine, engineering, and education, SDTs use image data (image-based SDTs) to observe and learn system behaviors and control their behaviors. This paper focuses on various approaches and associated challenges in developing image-based SDTs by continually assimilating image data from physical systems. The paper also discusses the challenges involved in designing and implementing DL models for SDTs, including data acquisition, processing, and interpretation. In addition, insights into the future directions and opportunities for developing new image-based DL approaches to develop robust SDTs are provided. This includes the potential for using generative models for data augmentation, developing multi-modal DL models, and exploring the integration of DL with other technologies, including 5G, edge computing, and IoT. In this paper, we describe the image-based SDTs, which enable broader adoption of the digital twin DT paradigms across a broad spectrum of areas and the development of new methods to improve the abilities of SDTs in replicating, predicting, and optimizing the behavior of complex systems.
- [409] arXiv:2401.02524 (cross-list from cs.LG) [ pdf , other ]
-
Title: Comprehensive Exploration of Synthetic Data Generation: A SurveyAuthors: André Bauer , Simon Trapp , Michael Stenger , Robert Leppich , Samuel Kounev , Mark Leznik , Kyle Chard , Ian FosterComments: Fixed bug in Figure 44Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Recent years have witnessed a surge in the popularity of Machine Learning (ML), applied across diverse domains. However, progress is impeded by the scarcity of training data due to expensive acquisition and privacy legislation. Synthetic data emerges as a solution, but the abundance of released models and limited overview literature pose challenges for decision-making. This work surveys 417 Synthetic Data Generation (SDG) models over the last decade, providing a comprehensive overview of model types, functionality, and improvements. Common attributes are identified, leading to a classification and trend analysis. The findings reveal increased model performance and complexity, with neural network-based approaches prevailing, except for privacy-preserving data generation. Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete. Implications from our performance evaluation highlight the scarcity of common metrics and datasets, making comparisons challenging. Additionally, the neglect of training and computational costs in literature necessitates attention in future research. This work serves as a guide for SDG model selection and identifies crucial areas for future exploration.
- [410] arXiv:2401.02542 (cross-list from cs.SI) [ pdf , other ]
-
Title: A Community Detection and Graph Neural Network Based Link Prediction Approach for Scientific LiteratureSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: This study presents a novel approach that synergizes community detection algorithms with various Graph Neural Network (GNN) models to bolster link prediction in scientific literature networks. By integrating the Louvain community detection algorithm into our GNN frameworks, we consistently enhance performance across all models tested. For example, integrating Louvain with the GAT model resulted in an AUC score increase from 0.777 to 0.823, exemplifying the typical improvements observed. Similar gains are noted when Louvain is paired with other GNN architectures, confirming the robustness and effectiveness of incorporating community-level insights. This consistent uplift in performance reflected in our extensive experimentation on bipartite graphs of scientific collaborations and citations highlights the synergistic potential of combining community detection with GNNs to overcome common link prediction challenges such as scalability and resolution limits. Our findings advocate for the integration of community structures as a significant step forward in the predictive accuracy of network science models, offering a comprehensive understanding of scientific collaboration patterns through the lens of advanced machine learning techniques.
- [411] arXiv:2401.02575 (cross-list from cs.SI) [ pdf , other ]
-
Title: Large Language Models for Social Networks: Applications, Challenges, and SolutionsAuthors: Jingying Zeng , Richard Huang , Waleed Malik , Langxuan Yin , Bojan Babic , Danny Shacham , Xiao Yan , Jaewon Yang , Qi HeSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) are transforming the way people generate, explore, and engage with content. We study how we can develop LLM applications for online social networks. Despite LLMs' successes in other domains, it is challenging to develop LLM-based products for social networks for numerous reasons, and it has been relatively under-reported in the research community. We categorize LLM applications for social networks into three categories. First is knowledge tasks where users want to find new knowledge and information, such as search and question-answering. Second is entertainment tasks where users want to consume interesting content, such as getting entertaining notification content. Third is foundational tasks that need to be done to moderate and operate the social networks, such as content annotation and LLM monitoring. For each task, we share the challenges we found, solutions we developed, and lessons we learned. To the best of our knowledge, this is the first comprehensive paper about developing LLM applications for social networks.
- [412] arXiv:2401.02576 (cross-list from cs.LG) [ pdf , other ]
-
Title: t-DGR: A Trajectory-Based Deep Generative Replay Method for Continual Learning in Decision MakingComments: 2nd Workshop on Agent Learning in Open-Endedness (ALOE) at NeurIPS 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Deep generative replay has emerged as a promising approach for continual learning in decision-making tasks. This approach addresses the problem of catastrophic forgetting by leveraging the generation of trajectories from previously encountered tasks to augment the current dataset. However, existing deep generative replay methods for continual learning rely on autoregressive models, which suffer from compounding errors in the generated trajectories. In this paper, we propose a simple, scalable, and non-autoregressive method for continual learning in decision-making tasks using a generative model that generates task samples conditioned on the trajectory timestep. We evaluate our method on Continual World benchmarks and find that our approach achieves state-of-the-art performance on the average success rate metric among continual learning methods. Code is available at this https URL .
- [413] arXiv:2401.02586 (cross-list from cs.LG) [ pdf , other ]
-
Title: Federated Learning for distribution skewed data using sample weightsComments: Accepted to IEEE Transaction on Artificial IntelligenceJournal-ref: IEEE Transaction on Artificial Intelligence 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: One of the most challenging issues in federated learning is that the data is often not independent and identically distributed (nonIID). Clients are expected to contribute the same type of data and drawn from one global distribution. However, data are often collected in different ways from different resources. Thus, the data distributions among clients might be different from the underlying global distribution. This creates a weight divergence issue and reduces federated learning performance. This work focuses on improving federated learning performance for skewed data distribution across clients. The main idea is to adjust the client distribution closer to the global distribution using sample weights. Thus, the machine learning model converges faster with higher accuracy. We start from the fundamental concept of empirical risk minimization and theoretically derive a solution for adjusting the distribution skewness using sample weights. To determine sample weights, we implicitly exchange density information by leveraging a neural network-based density estimation model, MADE. The clients data distribution can then be adjusted without exposing their raw data. Our experiment results on three real-world datasets show that the proposed method not only improves federated learning accuracy but also significantly reduces communication costs compared to the other experimental methods.
- [414] arXiv:2401.02588 (cross-list from cs.CV) [ pdf , other ]
-
Title: Characterizing Satellite Geometry via Accelerated 3D Gaussian SplattingComments: 11 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The accelerating deployment of spacecraft in orbit have generated interest in on-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possible unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. This requires robust characterization of the target's geometry. In this article, we present an approach for mapping geometries of satellites on orbit based on 3D Gaussian Splatting that can run on computing resources available on current spaceflight hardware. We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up under several realistic lighting and motion conditions. Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms. Such on-board capabilities are critical to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks.
- [415] arXiv:2401.02591 (cross-list from cs.LG) [ pdf , other ]
-
Title: Synthetic Information towards Maximum Posterior Ratio for deep learning on Imbalanced DataComments: Accepted to IEEE Transaction on Artificial IntelligenceJournal-ref: IEEE Transaction on Artificial Intelligence 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This study examines the impact of class-imbalanced data on deep learning models and proposes a technique for data balancing by generating synthetic data for the minority class. Unlike random-based oversampling, our method prioritizes balancing the informative regions by identifying high entropy samples. Generating well-placed synthetic data can enhance machine learning algorithms accuracy and efficiency, whereas poorly-placed ones may lead to higher misclassification rates. We introduce an algorithm that maximizes the probability of generating a synthetic sample in the correct region of its class by optimizing the class posterior ratio. Additionally, to maintain data topology, synthetic data are generated within each minority sample's neighborhood. Our experimental results on forty-one datasets demonstrate the superior performance of our technique in enhancing deep-learning models.
- [416] arXiv:2401.02600 (cross-list from cs.CV) [ pdf , other ]
-
Title: Object-oriented backdoor attack against image captioningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Backdoor attack against image classification task has been widely studied and proven to be successful, while there exist little research on the backdoor attack against vision-language models. In this paper, we explore backdoor attack towards image captioning models by poisoning training data. Assuming the attacker has total access to the training dataset, and cannot intervene in model construction or training process. Specifically, a portion of benign training samples is randomly selected to be poisoned. Afterwards, considering that the captions are usually unfolded around objects in an image, we design an object-oriented method to craft poisons, which aims to modify pixel values by a slight range with the modification number proportional to the scale of the current detected object region. After training with the poisoned data, the attacked model behaves normally on benign images, but for poisoned images, the model will generate some sentences irrelevant to the given image. The attack controls the model behavior on specific test images without sacrificing the generation performance on benign test images. Our method proves the weakness of image captioning models to backdoor attack and we hope this work can raise the awareness of defending against backdoor attack in the image captioning field.
- [417] arXiv:2401.02602 (cross-list from cs.LG) [ pdf , other ]
-
Title: Neural Causal AbstractionsComments: 48 total pages, 20 figures, short version accepted to AAAI-24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The abilities of humans to understand the world in terms of cause and effect relationships, as well as to compress information into abstract concepts, are two hallmark features of human intelligence. These two topics have been studied in tandem in the literature under the rubric of causal abstractions theory. In practice, it remains an open problem how to best leverage abstraction theory in real-world causal inference tasks, where the true mechanisms are unknown and only limited data is available. In this paper, we develop a new family of causal abstractions by clustering variables and their domains. This approach refines and generalizes previous notions of abstractions to better accommodate individual causal distributions that are spawned by Pearl's causal hierarchy. We show that such abstractions are learnable in practical settings through Neural Causal Models (Xia et al., 2021), enabling the use of the deep learning toolkit to solve various challenging causal inference tasks -- identification, estimation, sampling -- at different levels of granularity. Finally, we integrate these results with representation learning to create more flexible abstractions, moving these results closer to practical applications. Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
- [418] arXiv:2401.02627 (cross-list from cs.CY) [ pdf , other ]
-
Title: Characteristics and prevalence of fake social media profiles with AI-generated facesComments: 18 pages, 6 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Recent advancements in generative artificial intelligence (AI) have raised concerns about their potential to create convincing fake social media accounts, but empirical evidence is lacking. In this paper, we present a systematic analysis of Twitter(X) accounts using human faces generated by Generative Adversarial Networks (GANs) for their profile pictures. We present a dataset of 1,353 such accounts and show that they are used to spread scams, spam, and amplify coordinated messages, among other inauthentic activities. Leveraging a feature of GAN-generated faces -- consistent eye placement -- and supplementing it with human annotation, we devise an effective method for identifying GAN-generated profiles in the wild. Applying this method to a random sample of active Twitter users, we estimate a lower bound for the prevalence of profiles using GAN-generated faces between 0.021% and 0.044% -- around 10K daily active accounts. These findings underscore the emerging threats posed by multimodal generative AI. We release the source code of our detection method and the data we collect to facilitate further investigation. Additionally, we provide practical heuristics to assist social media users in recognizing such accounts.
- [419] arXiv:2401.02644 (cross-list from cs.LG) [ pdf , other ]
-
Title: Simple Hierarchical Planning with DiffusionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost -- a crucial factor for diffusion-based planning methods, as we have empirically verified. Additionally, the jumpy sub-goals guide our low-level planner, facilitating a fine-tuning stage and further improving our approach's effectiveness. We conducted empirical evaluations on standard offline reinforcement learning benchmarks, demonstrating our method's superior performance and efficiency in terms of training and planning speed compared to the non-hierarchical Diffuser as well as other hierarchical planning methods. Moreover, we explore our model's generalization capability, particularly on how our method improves generalization capabilities on compositional out-of-distribution tasks.
- [420] arXiv:2401.02652 (cross-list from cs.LG) [ pdf , other ]
-
Title: Adaptive Discounting of Training Time AttacksComments: 19 pages, 7 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Among the most insidious attacks on Reinforcement Learning (RL) solutions are training-time attacks (TTAs) that create loopholes and backdoors in the learned behaviour. Not limited to a simple disruption, constructive TTAs (C-TTAs) are now available, where the attacker forces a specific, target behaviour upon a training RL agent (victim). However, even state-of-the-art C-TTAs focus on target behaviours that could be naturally adopted by the victim if not for a particular feature of the environment dynamics, which C-TTAs exploit. In this work, we show that a C-TTA is possible even when the target behaviour is un-adoptable due to both environment dynamics as well as non-optimality with respect to the victim objective(s). To find efficient attacks in this context, we develop a specialised flavour of the DDPG algorithm, which we term gammaDDPG, that learns this stronger version of C-TTA. gammaDDPG dynamically alters the attack policy planning horizon based on the victim's current behaviour. This improves effort distribution throughout the attack timeline and reduces the effect of uncertainty the attacker has about the victim. To demonstrate the features of our method and better relate the results to prior research, we borrow a 3D grid domain from a state-of-the-art C-TTA for our experiments. Code is available at " this http URL .
- [421] arXiv:2401.02653 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Deep Q-Learning based Smart Scheduling of EVs for Demand Response in Smart GridsComments: Submitted to journalSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Economic and policy factors are driving the continuous increase in the adoption and usage of electrical vehicles (EVs). However, despite being a cleaner alternative to combustion engine vehicles, EVs have negative impacts on the lifespan of microgrid equipment and energy balance due to increased power demand and the timing of their usage. In our view grid management should leverage on EVs scheduling flexibility to support local network balancing through active participation in demand response programs. In this paper, we propose a model-free solution, leveraging Deep Q-Learning to schedule the charging and discharging activities of EVs within a microgrid to align with a target energy profile provided by the distribution system operator. We adapted the Bellman Equation to assess the value of a state based on specific rewards for EV scheduling actions and used a neural network to estimate Q-values for available actions and the epsilon-greedy algorithm to balance exploitation and exploration to meet the target energy profile. The results are promising showing that the proposed solution can effectively schedule the EVs charging and discharging actions to align with the target profile with a Person coefficient of 0.99, handling effective EVs scheduling situations that involve dynamicity given by the e-mobility features, relying only on data with no knowledge of EVs and microgrid dynamics.
- [422] arXiv:2401.02661 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Nurse-in-the-Loop Artificial Intelligence for Precision Management of Type 2 Diabetes in a Clinical Trial Utilizing Transfer-Learned Predictive Digital TwinComments: Submitted for reviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Background: Type 2 diabetes (T2D) is a prevalent chronic disease with a significant risk of serious health complications and negative impacts on the quality of life. Given the impact of individual characteristics and lifestyle on the treatment plan and patient outcomes, it is crucial to develop precise and personalized management strategies. Artificial intelligence (AI) provides great promise in combining patterns from various data sources with nurses' expertise to achieve optimal care. Methods: This is a 6-month ancillary study among T2D patients (n = 20, age = 57 +- 10). Participants were randomly assigned to an intervention (AI, n=10) group to receive daily AI-generated individualized feedback or a control group without receiving the daily feedback (non-AI, n=10) in the last three months. The study developed an online nurse-in-the-loop predictive control (ONLC) model that utilizes a predictive digital twin (PDT). The PDT was developed using a transfer-learning-based Artificial Neural Network. The PDT was trained on participants self-monitoring data (weight, food logs, physical activity, glucose) from the first three months, and the online control algorithm applied particle swarm optimization to identify impactful behavioral changes for maintaining the patient's glucose and weight levels for the next three months. The ONLC provided the intervention group with individualized feedback and recommendations via text messages. The PDT was re-trained weekly to improve its performance. Findings: The trained ONLC model achieved >=80% prediction accuracy across all patients while the model was tuned online. Participants in the intervention group exhibited a trend of improved daily steps and stable or improved total caloric and total carb intake as recommended.
- [423] arXiv:2401.02663 (cross-list from cs.LG) [ pdf , other ]
-
Title: A backdoor attack against link prediction tasks with graph neural networksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Graph Neural Networks (GNNs) are a class of deep learning models capable of processing graph-structured data, and they have demonstrated significant performance in a variety of real-world applications. Recent studies have found that GNN models are vulnerable to backdoor attacks. When specific patterns (called backdoor triggers, e.g., subgraphs, nodes, etc.) appear in the input data, the backdoor embedded in the GNN models is activated, which misclassifies the input data into the target class label specified by the attacker, whereas when there are no backdoor triggers in the input, the backdoor embedded in the GNN models is not activated, and the models work normally. Backdoor attacks are highly stealthy and expose GNN models to serious security risks. Currently, research on backdoor attacks against GNNs mainly focus on tasks such as graph classification and node classification, and backdoor attacks against link prediction tasks are rarely studied. In this paper, we propose a backdoor attack against the link prediction tasks based on GNNs and reveal the existence of such security vulnerability in GNN models, which make the backdoored GNN models to incorrectly predict unlinked two nodes as having a link relationship when a trigger appear. The method uses a single node as the trigger and poison selected node pairs in the training graph, and then the backdoor will be embedded in the GNN models through the training process. In the inference stage, the backdoor in the GNN models can be activated by simply linking the trigger node to the two end nodes of the unlinked node pairs in the input data, causing the GNN models to produce incorrect link prediction results for the target node pairs.
- [424] arXiv:2401.02665 (cross-list from cs.LG) [ pdf , other ]
-
Title: Zero-shot Microclimate Prediction with Deep LearningJournal-ref: Tackling Climate Change with Machine Learning: workshop at NeurIPS 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract: Weather station data is a valuable resource for climate prediction, however, its reliability can be limited in remote locations. To compound the issue, making local predictions often relies on sensor data that may not be accessible for a new, previously unmonitored location. In response to these challenges, we propose a novel zero-shot learning approach designed to forecast various climate measurements at new and unmonitored locations. Our method surpasses conventional weather forecasting techniques in predicting microclimate variables by leveraging knowledge extracted from other geographic locations.
- [425] arXiv:2401.02677 (cross-list from cs.CV) [ pdf , other ]
-
Title: Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level LossSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Stable Diffusion XL (SDXL) has become the best open source text-to-image model (T2I) for its versatility and top-notch image quality. Efficiently addressing the computational demands of SDXL models is crucial for wider reach and applicability. In this work, we introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter UNets, respectively, achieved through progressive removal using layer-level losses focusing on reducing the model size while preserving generative quality. We release these models weights at this https URL . Our methodology involves the elimination of residual networks and transformer blocks from the U-Net structure of SDXL, resulting in significant reductions in parameters, and latency. Our compact models effectively emulate the original SDXL by capitalizing on transferred knowledge, achieving competitive results against larger multi-billion parameter SDXL. Our work underscores the efficacy of knowledge distillation coupled with layer-level losses in reducing model size while preserving the high-quality generative capabilities of SDXL, thus facilitating more accessible deployment in resource-constrained environments.
- [426] arXiv:2401.02683 (cross-list from cs.LG) [ pdf , other ]
-
Title: Geometric-Facilitated Denoising Diffusion Model for 3D Molecule GenerationComments: 9 pages, 6 figures, AAAI-24 Main TrackJournal-ref: Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 1 (March 25, 2024): 338-346Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Abstract: Denoising diffusion models have shown great potential in multiple research areas. Existing diffusion-based generative methods on de novo 3D molecule generation face two major challenges. Since majority heavy atoms in molecules allow connections to multiple atoms through single bonds, solely using pair-wise distance to model molecule geometries is insufficient. Therefore, the first one involves proposing an effective neural network as the denoising kernel that is capable to capture complex multi-body interatomic relationships and learn high-quality features. Due to the discrete nature of graphs, mainstream diffusion-based methods for molecules heavily rely on predefined rules and generate edges in an indirect manner. The second challenge involves accommodating molecule generation to diffusion and accurately predicting the existence of bonds. In our research, we view the iterative way of updating molecule conformations in diffusion process is consistent with molecular dynamics and introduce a novel molecule generation method named Geometric-Facilitated Molecular Diffusion (GFMDiff). For the first challenge, we introduce a Dual-Track Transformer Network (DTN) to fully excevate global spatial relationships and learn high quality representations which contribute to accurate predictions of features and geometries. As for the second challenge, we design Geometric-Facilitated Loss (GFLoss) which intervenes the formation of bonds during the training period, instead of directly embedding edges into the latent space. Comprehensive experiments on current benchmarks demonstrate the superiority of GFMDiff.
- [427] arXiv:2401.02708 (cross-list from cs.LG) [ pdf , other ]
-
Title: TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival AnalysisComments: 9 pages,6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: A core challenge in survival analysis is to model the distribution of censored time-to-event data, where the event of interest may be a death, failure, or occurrence of a specific event. Previous studies have showed that ranking and maximum likelihood estimation (MLE)loss functions are widely-used for survival analysis. However, ranking loss only focus on the ranking of survival time and does not consider potential effect of samples for exact survival time values. Furthermore, the MLE is unbounded and easily subject to outliers (e.g., censored data), which may cause poor performance of modeling. To handle the complexities of learning process and exploit valuable survival time values, we propose a time-adaptive coordinate loss function, TripleSurv, to achieve adaptive adjustments by introducing the differences in the survival time between sample pairs into the ranking, which can encourage the model to quantitatively rank relative risk of pairs, ultimately enhancing the accuracy of predictions. Most importantly, the TripleSurv is proficient in quantifying the relative risk between samples by ranking ordering of pairs, and consider the time interval as a trade-off to calibrate the robustness of model over sample distribution. Our TripleSurv is evaluated on three real-world survival datasets and a public synthetic dataset. The results show that our method outperforms the state-of-the-art methods and exhibits good model performance and robustness on modeling various sophisticated data distributions with different censor rates. Our code will be available upon acceptance.
- [428] arXiv:2401.02709 (cross-list from cs.CL) [ pdf , other ]
-
Title: German Text Embedding Clustering BenchmarkComments: 15 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This work introduces a benchmark assessing the performance of clustering German text embeddings in different domains. This benchmark is driven by the increasing use of clustering neural text embeddings in tasks that require the grouping of texts (such as topic modeling) and the need for German resources in existing benchmarks. We provide an initial analysis for a range of pre-trained mono- and multilingual models evaluated on the outcome of different clustering algorithms. Results include strong performing mono- and multilingual models. Reducing the dimensions of embeddings can further improve clustering. Additionally, we conduct experiments with continued pre-training for German BERT models to estimate the benefits of this additional training. Our experiments suggest that significant performance improvements are possible for short text. All code and datasets are publicly available.
- [429] arXiv:2401.02710 (cross-list from cs.CE) [ pdf , other ]
-
Title: Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement LearningComments: Accepted by ICOIN 2024Subjects: Computational Engineering, Finance, and Science (cs.CE) ; Artificial Intelligence (cs.AI)
Abstract: Mining of formulaic alpha factors refers to the process of discovering and developing specific factors or indicators (referred to as alpha factors) for quantitative trading in stock market. To efficiently discover alpha factors in vast search space, reinforcement learning (RL) is commonly employed. This paper proposes a method to enhance existing alpha factor mining approaches by expanding a search space and utilizing pretrained formulaic alpha set as initial seed values to generate synergistic formulaic alpha. We employ information coefficient (IC) and rank information coefficient (Rank IC) as performance evaluation metrics for the model. Using CSI300 market data, we conducted real investment simulations and observed significant performance improvement compared to existing techniques.
- [430] arXiv:2401.02713 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph-level Protein Representation Learning by Structure Knowledge RefinementSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM)
Abstract: This paper focuses on learning representation on the whole graph level in an unsupervised manner. Learning graph-level representation plays an important role in a variety of real-world issues such as molecule property prediction, protein structure feature extraction, and social network analysis. The mainstream method is utilizing contrastive learning to facilitate graph feature extraction, known as Graph Contrastive Learning (GCL). GCL, although effective, suffers from some complications in contrastive learning, such as the effect of false negative pairs. Moreover, augmentation strategies in GCL are weakly adaptive to diverse graph datasets. Motivated by these problems, we propose a novel framework called Structure Knowledge Refinement (SKR) which uses data structure to determine the probability of whether a pair is positive or negative. Meanwhile, we propose an augmentation strategy that naturally preserves the semantic meaning of the original data and is compatible with our SKR framework. Furthermore, we illustrate the effectiveness of our SKR framework through intuition and experiments. The experimental results on the tasks of graph-level classification demonstrate that our SKR framework is superior to most state-of-the-art baselines.
- [431] arXiv:2401.02717 (cross-list from cs.CV) [ pdf , other ]
-
Title: Complementary Information Mutual Learning for Multimodality Medical Image SegmentationAuthors: Chuyun Shen , Wenhao Li , Haoqing Chen , Xiaoling Wang , Fengping Zhu , Yuxin Li , Xiangfeng Wang , Bo JinComments: 35 pages, 18 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Radiologists must utilize multiple modal images for tumor segmentation and diagnosis due to the limitations of medical imaging and the diversity of tumor signals. This leads to the development of multimodal learning in segmentation. However, the redundancy among modalities creates challenges for existing subtraction-based joint learning methods, such as misjudging the importance of modalities, ignoring specific modal information, and increasing cognitive load. These thorny issues ultimately decrease segmentation accuracy and increase the risk of overfitting. This paper presents the complementary information mutual learning (CIML) framework, which can mathematically model and address the negative impact of inter-modal redundant information. CIML adopts the idea of addition and removes inter-modal redundant information through inductive bias-driven task decomposition and message passing-based redundancy filtering. CIML first decomposes the multimodal segmentation task into multiple subtasks based on expert prior knowledge, minimizing the information dependence between modalities. Furthermore, CIML introduces a scheme in which each modality can extract information from other modalities additively through message passing. To achieve non-redundancy of extracted information, the redundant filtering is transformed into complementary information learning inspired by the variational information bottleneck. The complementary information learning procedure can be efficiently solved by variational inference and cross-modal spatial attention. Numerical results from the verification task and standard benchmarks indicate that CIML efficiently removes redundant information between modalities, outperforming SOTA methods regarding validation accuracy and segmentation effect.
- [432] arXiv:2401.02719 (cross-list from cs.CV) [ pdf , other ]
-
Title: Learning Image Demoireing from Unpaired Real DataComments: AAAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This paper focuses on addressing the issue of image demoireing. Unlike the large volume of existing studies that rely on learning from paired real data, we attempt to learn a demoireing model from unpaired real data, i.e., moire images associated with irrelevant clean images. The proposed method, referred to as Unpaired Demoireing (UnDeM), synthesizes pseudo moire images from unpaired datasets, generating pairs with clean images for training demoireing models. To achieve this, we divide real moire images into patches and group them in compliance with their moire complexity. We introduce a novel moire generation framework to synthesize moire images with diverse moire features, resembling real moire patches, and details akin to real moire-free images. Additionally, we introduce an adaptive denoise method to eliminate the low-quality pseudo moire images that adversely impact the learning of demoireing models. We conduct extensive experiments on the commonly-used FHDMi and UHDM datasets. Results manifest that our UnDeM performs better than existing methods when using existing demoireing models such as MBCNN and ESDNet-L. Code: this https URL
- [433] arXiv:2401.02727 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Enhancing targeted transferability via feature space fine-tuningComments: 9 pages, 10 figures, accepted by 2024ICASSPSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Adversarial examples (AEs) have been extensively studied due to their potential for privacy protection and inspiring robust neural networks. Yet, making a targeted AE transferable across unknown models remains challenging. In this paper, to alleviate the overfitting dilemma common in an AE crafted by existing simple iterative attacks, we propose fine-tuning it in the feature space. Specifically, starting with an AE generated by a baseline attack, we encourage the features conducive to the target class and discourage the features to the original class in a middle layer of the source model. Extensive experiments demonstrate that only a few iterations of fine-tuning can boost existing attacks' targeted transferability nontrivially and universally. Our results also verify that the simple iterative attacks can yield comparable or even better transferability than the resource-intensive methods, which rest on training target-specific classifiers or generators with additional data. The code is available at: this http URL .
- [434] arXiv:2401.02740 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Fairness-Aware Job Scheduling for Multi-Job Federated LearningComments: accepted by ICASSP 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated learning (FL) enables multiple data owners (a.k.a. FL clients) to collaboratively train machine learning models without disclosing sensitive private data. Existing FL research mostly focuses on the monopoly scenario in which a single FL server selects a subset of FL clients to update their local models in each round of training. In practice, there can be multiple FL servers simultaneously trying to select clients from the same pool. In this paper, we propose a first-of-its-kind Fairness-aware Federated Job Scheduling (FairFedJS) approach to bridge this gap. Based on Lyapunov optimization, it ensures fair allocation of high-demand FL client datasets to FL jobs in need of them, by jointly considering the current demand and the job payment bids, in order to prevent prolonged waiting. Extensive experiments comparing FairFedJS against four state-of-the-art approaches on two datasets demonstrate its significant advantages. It outperforms the best baseline by 31.9% and 1.0% on average in terms of scheduling fairness and convergence time, respectively, while achieving comparable test accuracy.
- [435] arXiv:2401.02773 (cross-list from cs.LG) [ pdf , other ]
-
Title: Tackling Electrode Shift In Gesture Recognition with HD-EMG Electrode SubsetsComments: 5 pagesJournal-ref: ICASSP 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: sEMG pattern recognition algorithms have been explored extensively in decoding movement intent, yet are known to be vulnerable to changing recording conditions, exhibiting significant drops in performance across subjects, and even across sessions. Multi-channel surface EMG, also referred to as high-density sEMG (HD-sEMG) systems, have been used to improve performance with the information collected through the use of additional electrodes. However, a lack of robustness is ever present due to limited datasets and the difficulties in addressing sources of variability, such as electrode placement. In this study, we propose training on a collection of input channel subsets and augmenting our training distribution with data from different electrode locations, simultaneously targeting electrode shift and reducing input dimensionality. Our method increases robustness against electrode shift and results in significantly higher intersession performance across subjects and classification algorithms.
- [436] arXiv:2401.02777 (cross-list from cs.CL) [ pdf , other ]
-
Title: From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents. RAISE, an enhancement of the ReAct framework, incorporates a dual-component memory system, mirroring human short-term and long-term memory, to maintain context and continuity in conversations. It entails a comprehensive agent construction scenario, including phases like Conversation Selection, Scene Extraction, CoT Completion, and Scene Augmentation, leading to the LLMs Training phase. This approach appears to enhance agent controllability and adaptability in complex, multi-turn dialogues. Our preliminary evaluations in a real estate sales context suggest that RAISE has some advantages over traditional agents, indicating its potential for broader applications. This work contributes to the AI field by providing a robust framework for developing more context-aware and versatile conversational agents.
- [437] arXiv:2401.02797 (cross-list from cs.CL) [ pdf , other ]
-
Title: PeFoMed: Parameter Efficient Fine-tuning of Multimodal Large Language Models for Medical ImagingComments: 12 pages, 8 figures, 12 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Multimodal large language models (MLLMs) represent an evolutionary expansion in the capabilities of traditional large language models, enabling them to tackle challenges that surpass the scope of purely text-based applications. It leverages the knowledge previously encoded within these language models, thereby enhancing their applicability and functionality in the reign of multimodal contexts. Recent works investigate the adaptation of MLLMs as a universal solution to address medical multi-modal problems as a generative task. In this paper, we propose a parameter efficient framework for fine-tuning MLLMs, specifically validated on medical visual question answering (Med-VQA) and medical report generation (MRG) tasks, using public benchmark datasets. We also introduce an evaluation metric using the 5-point Likert scale and its weighted average value to measure the quality of the generated reports for MRG tasks, where the scale ratings are labelled by both humans manually and the GPT-4 model. We further assess the consistency of performance metrics across traditional measures, GPT-4, and human ratings for both VQA and MRG tasks. The results indicate that semantic similarity assessments using GPT-4 align closely with human annotators and provide greater stability, yet they reveal a discrepancy when compared to conventional lexical similarity measurements. This questions the reliability of lexical similarity metrics for evaluating the performance of generative models in Med-VQA and report generation tasks. Besides, our fine-tuned model significantly outperforms GPT-4v. This indicates that without additional fine-tuning, multi-modal models like GPT-4v do not perform effectively on medical imaging tasks. The code will be available here: this https URL .
- [438] arXiv:2401.02810 (cross-list from cs.LG) [ pdf , other ]
-
Title: Physics-Informed Neural Networks for High-Frequency and Multi-Scale Problems using Transfer LearningComments: 18 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Abstract: Physics-informed neural network (PINN) is a data-driven solver for partial and ordinary differential equations(ODEs/PDEs). It provides a unified framework to address both forward and inverse problems. However, the complexity of the objective function often leads to training failures. This issue is particularly prominent when solving high-frequency and multi-scale problems. We proposed using transfer learning to boost the robustness and convergence of training PINN, starting training from low-frequency problems and gradually approaching high-frequency problems. Through two case studies, we discovered that transfer learning can effectively train PINN to approximate solutions from low-frequency problems to high-frequency problems without increasing network parameters. Furthermore, it requires fewer data points and less training time. We elaborately described our training strategy, including optimizer selection, and suggested guidelines for using transfer learning to train neural networks for solving more complex problems.
- [439] arXiv:2401.02838 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: CrisisViT: A Robust Vision Transformer for Crisis Image ClassificationJournal-ref: Proceedings of the 20th International ISCRAM Conference 2023, pp. 309--319Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM); Social and Information Networks (cs.SI)
Abstract: In times of emergency, crisis response agencies need to quickly and accurately assess the situation on the ground in order to deploy relevant services and resources. However, authorities often have to make decisions based on limited information, as data on affected regions can be scarce until local response services can provide first-hand reports. Fortunately, the widespread availability of smartphones with high-quality cameras has made citizen journalism through social media a valuable source of information for crisis responders. However, analyzing the large volume of images posted by citizens requires more time and effort than is typically available. To address this issue, this paper proposes the use of state-of-the-art deep neural models for automatic image classification/tagging, specifically by adapting transformer-based architectures for crisis image classification (CrisisViT). We leverage the new Incidents1M crisis image dataset to develop a range of new transformer-based image classification models. Through experimentation over the standard Crisis image benchmark dataset, we demonstrate that the CrisisViT models significantly outperform previous approaches in emergency type, image relevance, humanitarian category, and damage severity classification. Additionally, we show that the new Incidents1M dataset can further augment the CrisisViT models resulting in an additional 1.25% absolute accuracy gain.
- [440] arXiv:2401.02843 (cross-list from cs.CY) [ pdf , other ]
-
Title: Thousands of AI Authors on the Future of AIAuthors: Katja Grace , Harlan Stewart , Julia Fabienne Sandkühler , Stephen Thomas , Ben Weinstein-Raun , Jan BraunerComments: The asterisk indicates the corresponding author. The dagger indicates equal contributionSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the largest survey of its kind, 2,778 researchers who had published in top-tier artificial intelligence (AI) venues gave predictions on the pace of AI progress and the nature and impacts of advanced AI systems The aggregate forecasts give at least a 50% chance of AI systems achieving several milestones by 2028, including autonomously constructing a payment processing site from scratch, creating a song indistinguishable from a new song by a popular musician, and autonomously downloading and fine-tuning a large language model. If science continues undisrupted, the chance of unaided machines outperforming humans in every possible task was estimated at 10% by 2027, and 50% by 2047. The latter estimate is 13 years earlier than that reached in a similar survey we conducted only one year earlier [Grace et al., 2022]. However, the chance of all human occupations becoming fully automatable was forecast to reach 10% by 2037, and 50% as late as 2116 (compared to 2164 in the 2022 survey).
Most respondents expressed substantial uncertainty about the long-term value of AI progress: While 68.3% thought good outcomes from superhuman AI are more likely than bad, of these net optimists 48% gave at least a 5% chance of extremely bad outcomes such as human extinction, and 59% of net pessimists gave 5% or more to extremely good outcomes. Between 38% and 51% of respondents gave at least a 10% chance to advanced AI leading to outcomes as bad as human extinction. More than half suggested that "substantial" or "extreme" concern is warranted about six different AI-related scenarios, including misinformation, authoritarian control, and inequality. There was disagreement about whether faster or slower AI progress would be better for the future of humanity. However, there was broad agreement that research aimed at minimizing potential risks from AI systems ought to be prioritized more. - [441] arXiv:2401.02860 (cross-list from cs.LG) [ pdf , other ]
-
Title: Framework for Variable-lag Motif Following Relation Inference In Time Series using Matrix Profile analysisComments: Revising based on an expert's comments in the research communitySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Knowing who follows whom and what patterns they are following are crucial steps to understand collective behaviors (e.g. a group of human, a school of fish, or a stock market). Time series is one of resources that can be used to get insight regarding following relations. However, the concept of following patterns or motifs and the solution to find them in time series are not obvious. In this work, we formalize a concept of following motifs between two time series and present a framework to infer following patterns between two time series. The framework utilizes one of efficient and scalable methods to retrieve motifs from time series called the Matrix Profile Method. We compare our proposed framework with several baselines. The framework performs better than baselines in the simulation datasets. In the dataset of sound recording, the framework is able to retrieve the following motifs within a pair of time series that two singers sing following each other. In the cryptocurrency dataset, the framework is capable of capturing the following motifs within a pair of time series from two digital currencies, which implies that the values of one currency follow the values of another currency patterns. Our framework can be utilized in any field of time series to get insight regarding following patterns between time series.
- [442] arXiv:2401.02870 (cross-list from cs.MA) [ pdf , other ]
-
Title: AFSPP: Agent Framework for Shaping Preference and Personality with Large Language ModelsSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The evolution of Large Language Models (LLMs) has introduced a new paradigm for investigating human behavior emulation. Recent research has employed LLM-based Agents to create a sociological research environment, in which agents exhibit behavior based on the unfiltered characteristics of large language models. However, these studies overlook the iterative development within a human-like setting - Human preferences and personalities are complex, shaped by various factors and subject to ongoing change as a result of environmental and subjective influences. In light of this observation, we propose Agent Framework for Shaping Preference and Personality (AFSPP), exploring the multifaceted impact of social networks and subjective consciousness on LLM-based Agents' preference and personality formation. With AFSPP, we have, for the first time, successfully replicated several key findings from human personality experiments. And other AFSPP-based experimental results indicate that plan making, sensory perceptions and social networking with subjective information, wield the most pronounced influence on preference shaping. AFSPP can significantly enhance the efficiency and scope of psychological experiments, while yielding valuable insights for Trustworthy Artificial Intelligence research for strategies to prevent undesirable preference and personality development.
- [443] arXiv:2401.02905 (cross-list from cs.LG) [ pdf , other ]
-
Title: H2G2-Net: A Hierarchical Heterogeneous Graph Generative Network Framework for Discovery of Multi-Modal Physiological ResponsesAuthors: Haidong Gu , Nathan Gaw , Yinan Wang , Chancellor Johnstone , Christine Beauchene , Sophia Yuditskaya , Hrishikesh Rao , Chun-An ChouComments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024 ( this https URL )Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Discovering human cognitive and emotional states using multi-modal physiological signals draws attention across various research applications. Physiological responses of the human body are influenced by human cognition and commonly used to analyze cognitive states. From a network science perspective, the interactions of these heterogeneous physiological modalities in a graph structure may provide insightful information to support prediction of cognitive states. However, there is no clue to derive exact connectivity between heterogeneous modalities and there exists a hierarchical structure of sub-modalities. Existing graph neural networks are designed to learn on non-hierarchical homogeneous graphs with pre-defined graph structures; they failed to learn from hierarchical, multi-modal physiological data without a pre-defined graph structure. To this end, we propose a hierarchical heterogeneous graph generative network (H2G2-Net) that automatically learns a graph structure without domain knowledge, as well as a powerful representation on the hierarchical heterogeneous graph in an end-to-end fashion. We validate the proposed method on the CogPilot dataset that consists of multi-modal physiological signals. Extensive experiments demonstrate that our proposed method outperforms the state-of-the-art GNNs by 5%-20% in prediction accuracy.
- [444] arXiv:2401.02920 (cross-list from cs.DC) [ pdf , other ]
-
Title: Analytically-Driven Resource Management for Cloud-Native MicroservicesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA.
We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches. - [445] arXiv:2401.02941 (cross-list from cs.CV) [ pdf , other ]
-
Title: Unsupervised Federated Domain Adaptation for Segmentation of MRI ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Automatic semantic segmentation of magnetic resonance imaging (MRI) images using deep neural networks greatly assists in evaluating and planning treatments for various clinical applications. However, training these models is conditioned on the availability of abundant annotated data to implement the end-to-end supervised learning procedure. Even if we annotate enough data, MRI images display considerable variability due to factors such as differences in patients, MRI scanners, and imaging protocols. This variability necessitates retraining neural networks for each specific application domain, which, in turn, requires manual annotation by expert radiologists for all new domains. To relax the need for persistent data annotation, we develop a method for unsupervised federated domain adaptation using multiple annotated source domains. Our approach enables the transfer of knowledge from several annotated source domains to adapt a model for effective use in an unannotated target domain. Initially, we ensure that the target domain data shares similar representations with each source domain in a latent embedding space, modeled as the output of a deep encoder, by minimizing the pair-wise distances of the distributions for the target domain and the source domains. We then employ an ensemble approach to leverage the knowledge obtained from all domains. We provide theoretical analysis and perform experiments on the MICCAI 2016 multi-site dataset to demonstrate our method is effective.
- [446] arXiv:2401.02949 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph2Tac: Learning Hierarchical Representations of Math Concepts in Theorem provingAuthors: Jason Rute , Miroslav Olšák , Lasse Blaauwbroek , Fidel Ivan Schaposnik Massolo , Jelle Piepenbrock , Vasily PestunComments: 32 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Concepts abound in mathematics and its applications. They vary greatly between subject areas, and new ones are introduced in each mathematical paper or application. A formal theory builds a hierarchy of definitions, theorems and proofs that reference each other. When an AI agent is proving a new theorem, most of the mathematical concepts and lemmas relevant to that theorem may have never been seen during training. This is especially true in the Coq proof assistant, which has a diverse library of Coq projects, each with its own definitions, lemmas, and even custom tactic procedures used to prove those lemmas. It is essential for agents to incorporate such new information into their knowledge base on the fly. We work towards this goal by utilizing a new, large-scale, graph-based dataset for machine learning in Coq. We leverage a faithful graph-representation of Coq terms that induces a directed graph of dependencies between definitions to create a novel graph neural network, Graph2Tac (G2T), that takes into account not only the current goal, but also the entire hierarchy of definitions that led to the current goal. G2T is an online model that is deeply integrated into the users' workflow and can adapt in real time to new Coq projects and their definitions. It complements well with other online models that learn in real time from new proof scripts. Our novel definition embedding task, which is trained to compute representations of mathematical concepts not seen during training, boosts the performance of the neural network to rival state-of-the-art k-nearest neighbor predictors.
- [447] arXiv:2401.02954 (cross-list from cs.CL) [ pdf , other ]
-
Title: DeepSeek LLM: Scaling Open-Source Language Models with LongtermismAuthors: DeepSeek-AI : Xiao Bi , Deli Chen , Guanting Chen , Shanhuang Chen , Damai Dai , Chengqi Deng , Honghui Ding , Kai Dong , Qiushi Du , Zhe Fu , Huazuo Gao , Kaige Gao , Wenjun Gao , Ruiqi Ge , Kang Guan , Daya Guo , Jianzhong Guo , Guangbo Hao , Zhewen Hao , Ying He , Wenjie Hu , Panpan Huang , Erhang Li , Guowei Li , Jiashi Li , Yao Li , Y.K. Li , Wenfeng Liang , Fangyun Lin , A.X. Liu , Bo Liu , Wen Liu , Xiaodong Liu , Xin Liu , Yiyuan Liu , Haoyu Lu , Shanghao Lu , Fuli Luo , Shirong Ma , Xiaotao Nie , Tian Pei , Yishi Piao , Junjie Qiu , Hui Qu , Tongzheng Ren , Zehui Ren , Chong Ruan , Zhangli Sha , Zhihong Shao , Junxiao Song , Xuecheng Su , Jingxiang Sun , Yaofeng Sun , Minghui Tang , Bingxuan Wang , Peiyi Wang , Shiyu Wang , Yaohui Wang , Yongji Wang , Tong Wu , Y. Wu , Xin Xie , Zhenda Xie , Ziwei Xie , Yiliang Xiong , Hanwei Xu , R.X. Xu , Yanhong Xu , et al. (18 additional authors not shown)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.
- [448] arXiv:2401.02971 (cross-list from cs.CL) [ pdf , other ]
-
Title: Deep Anomaly Detection in TextAuthors: Andrei ManolacheComments: M.Sc. thesis, University of Bucharest, Faculty of Mathematics and Computer Sciences, 2021Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Deep anomaly detection methods have become increasingly popular in recent years, with methods like Stacked Autoencoders, Variational Autoencoders, and Generative Adversarial Networks greatly improving the state-of-the-art. Other methods rely on augmenting classical models (such as the One-Class Support Vector Machine), by learning an appropriate kernel function using Neural Networks. Recent developments in representation learning by self-supervision are proving to be very beneficial in the context of anomaly detection. Inspired by the advancements in anomaly detection using self-supervised learning in the field of computer vision, this thesis aims to develop a method for detecting anomalies by exploiting pretext tasks tailored for text corpora. This approach greatly improves the state-of-the-art on two datasets, 20Newsgroups, and AG News, for both semi-supervised and unsupervised anomaly detection, thus proving the potential for self-supervised anomaly detectors in the field of natural language processing.
- [449] arXiv:2401.02974 (cross-list from cs.CL) [ pdf , other ]
-
Title: Efficacy of Utilizing Large Language Models to Detect Public Threat Posted OnlineComments: 10 pages, 4 figures (1 image figure saved in PNG)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: This paper examines the efficacy of utilizing large language models (LLMs) to detect public threats posted online. Amid rising concerns over the spread of threatening rhetoric and advance notices of violence, automated content analysis techniques may aid in early identification and moderation. Custom data collection tools were developed to amass post titles from a popular Korean online community, comprising 500 non-threat examples and 20 threats. Various LLMs (GPT-3.5, GPT-4, PaLM) were prompted to classify individual posts as either "threat" or "safe." Statistical analysis found all models demonstrated strong accuracy, passing chi-square goodness of fit tests for both threat and non-threat identification. GPT-4 performed best overall with 97.9% non-threat and 100% threat accuracy. Affordability analysis also showed PaLM API pricing as highly cost-efficient. The findings indicate LLMs can effectively augment human content moderation at scale to help mitigate emerging online risks. However, biases, transparency, and ethical oversight remain vital considerations before real-world implementation.
- [450] arXiv:2401.02976 (cross-list from cs.CL) [ pdf , other ]
-
Title: Trace and Edit Relation Associations in GPTSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study introduces a novel approach for analyzing and modifying entity relationships in GPT models, diverging from ROME's entity-focused methods. We develop a relation tracing technique to understand the influence of language model computations on relationship judgments. Using the FewRel dataset, we identify key roles of MLP modules and attention mechanisms in processing relationship information. Our method, tested against ROME on a new dataset, shows improved balance in specificity and generalization, underscoring the potential of manipulating early-layer modules for enhanced model understanding and accuracy.
- [451] arXiv:2401.02978 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Learning from a Generative AI Predecessor -- The Many Motivations for Interacting with Conversational AgentsComments: 26 pages, 18 figures, 2 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: For generative AI to succeed, how engaging a conversationalist must it be? For almost sixty years, some conversational agents have responded to any question or comment to keep a conversation going. In recent years, several utilized machine learning or sophisticated language processing, such as Tay, Xiaoice, Zo, Hugging Face, Kuki, and Replika. Unlike generative AI, they focused on engagement, not expertise. Millions of people were motivated to engage with them. What were the attractions? Will generative AI do better if it is equally engaging, or should it be less engaging? Prior to the emergence of generative AI, we conducted a large-scale quantitative and qualitative analysis to learn what motivated millions of people to engage with one such 'virtual companion,' Microsoft's Zo. We examined the complete chat logs of 2000 anonymized people. We identified over a dozen motivations that people had for interacting with this software. Designers learned different ways to increase engagement. Generative conversational AI does not yet have a clear revenue model to address its high cost. It might benefit from being more engaging, even as it supports productivity and creativity. Our study and analysis point to opportunities and challenges.
- [452] arXiv:2401.02979 (cross-list from cs.CL) [ pdf , other ]
-
Title: Are we describing the same sound? An analysis of word embedding spaces of expressive piano performanceJournal-ref: Proceedings of the Forum for Information Retrieval Evaluation, FIRE, 2023, Panjim, IndiaSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement.
- [453] arXiv:2401.02981 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Fine-tuning and Utilization Methods of Domain-specific LLMsAuthors: Cheonsu JeongSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent releases of pre-trained Large Language Models (LLMs) have gained considerable traction, yet research on fine-tuning and employing domain-specific LLMs remains scarce. This study investigates approaches for fine-tuning and leveraging domain-specific LLMs, highlighting trends in LLMs, foundational models, and methods for domain-specific pre-training. Focusing on the financial sector, it details dataset selection, preprocessing, model choice, and considerations crucial for LLM fine-tuning in finance. Addressing the unique characteristics of financial data, the study explores the construction of domain-specific vocabularies and considerations for security and regulatory compliance. In the practical application of LLM fine-tuning, the study outlines the procedure and implementation for generating domain-specific LLMs in finance. Various financial cases, including stock price prediction, sentiment analysis of financial news, automated document processing, research, information extraction, and customer service enhancement, are exemplified. The study explores the potential of LLMs in the financial domain, identifies limitations, and proposes directions for improvement, contributing valuable insights for future research. Ultimately, it advances natural language processing technology in business, suggesting proactive LLM utilization in financial services across industries.
- [454] arXiv:2401.02982 (cross-list from cs.CL) [ pdf , other ]
-
Title: BIBench: Benchmarking Data Analysis Knowledge of Large Language ModelsAuthors: Shu Liu , Shangqing Zhao , Chenghao Jia , Xinlin Zhuang , Zhaoguang Long , Qingquan Wu , Chong Yang , Aimin Zhou , Man LanSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of tasks. However, their proficiency and reliability in the specialized domain of Data Analysis, particularly with a focus on data-driven thinking, remain uncertain. To bridge this gap, we introduce BIBench, a comprehensive benchmark designed to evaluate the data analysis capabilities of LLMs within the context of Business Intelligence (BI). BIBench assesses LLMs across three dimensions: 1) BI foundational knowledge, evaluating the models' numerical reasoning and familiarity with financial concepts; 2) BI knowledge application, determining the models' ability to quickly comprehend textual information and generate analysis questions from multiple views; and 3) BI technical skills, examining the models' use of technical knowledge to address real-world data analysis challenges. BIBench comprises 11 sub-tasks, spanning three categories of task types: classification, extraction, and generation. Additionally, we've developed BIChat, a domain-specific dataset with over a million data points, to fine-tune LLMs. We will release BIBenchmark, BIChat, and the evaluation scripts at \url{ this https URL }. This benchmark aims to provide a measure for in-depth analysis of LLM abilities and foster the advancement of LLMs in the field of data analysis.
- [455] arXiv:2401.02984 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large Language Models in Mental Health Care: a Scoping ReviewAuthors: Yining Hua , Fenglin Liu , Kailai Yang , Zehan Li , Yi-han Sheu , Peilin Zhou , Lauren V. Moran , Sophia Ananiadou , Andrew BeamSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Objective: The growing use of large language models (LLMs) stimulates a need for a comprehensive review of their applications and outcomes in mental health care contexts. This scoping review aims to critically analyze the existing development and applications of LLMs in mental health care, highlighting their successes and identifying their challenges and limitations in these specialized fields. Materials and Methods: A broad literature search was conducted in November 2023 using six databases (PubMed, Web of Science, Google Scholar, arXiv, medRxiv, and PsyArXiv) following the 2020 version of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 313 publications were initially identified, and after applying the study inclusion criteria, 34 publications were selected for the final review. Results: We identified diverse applications of LLMs in mental health care, including diagnosis, therapy, patient engagement enhancement, etc. Key challenges include data availability and reliability, nuanced handling of mental states, and effective evaluation methods. Despite successes in accuracy and accessibility improvement, gaps in clinical applicability and ethical considerations were evident, pointing to the need for robust data, standardized evaluations, and interdisciplinary collaboration. Conclusion: LLMs show promising potential in advancing mental health care, with applications in diagnostics, and patient support. Continued advancements depend on collaborative, multidisciplinary efforts focused on framework enhancement, rigorous dataset development, technological refinement, and ethical integration to ensure the effective and safe application of LLMs in mental health care.
- [456] arXiv:2401.02985 (cross-list from cs.CL) [ pdf , other ]
-
Title: Evaluating Large Language Models on the GMAT: Implications for the Future of Business EducationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The rapid evolution of artificial intelligence (AI), especially in the domain of Large Language Models (LLMs) and generative AI, has opened new avenues for application across various fields, yet its role in business education remains underexplored. This study introduces the first benchmark to assess the performance of seven major LLMs, OpenAI's models (GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo), Google's models (PaLM 2, Gemini 1.0 Pro), and Anthropic's models (Claude 2 and Claude 2.1), on the GMAT, which is a key exam in the admission process for graduate business programs. Our analysis shows that most LLMs outperform human candidates, with GPT-4 Turbo not only outperforming the other models but also surpassing the average scores of graduate students at top business schools. Through a case study, this research examines GPT-4 Turbo's ability to explain answers, evaluate responses, identify errors, tailor instructions, and generate alternative scenarios. The latest LLM versions, GPT-4 Turbo, Claude 2.1, and Gemini 1.0 Pro, show marked improvements in reasoning tasks compared to their predecessors, underscoring their potential for complex problem-solving. While AI's promise in education, assessment, and tutoring is clear, challenges remain. Our study not only sheds light on LLMs' academic potential but also emphasizes the need for careful development and application of AI in education. As AI technology advances, it is imperative to establish frameworks and protocols for AI interaction, verify the accuracy of AI-generated content, ensure worldwide access for diverse learners, and create an educational environment where AI supports human expertise. This research sets the stage for further exploration into the responsible use of AI to enrich educational experiences and improve exam preparation and assessment methods.
- [457] arXiv:2401.02986 (cross-list from cs.CL) [ pdf , other ]
-
Title: Identification of Regulatory Requirements Relevant to Business Processes: A Comparative Study on Generative AI, Embedding-based Ranking, Crowd and Expert-driven MethodsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Organizations face the challenge of ensuring compliance with an increasing amount of requirements from various regulatory documents. Which requirements are relevant depends on aspects such as the geographic location of the organization, its domain, size, and business processes. Considering these contextual factors, as a first step, relevant documents (e.g., laws, regulations, directives, policies) are identified, followed by a more detailed analysis of which parts of the identified documents are relevant for which step of a given business process. Nowadays the identification of regulatory requirements relevant to business processes is mostly done manually by domain and legal experts, posing a tremendous effort on them, especially for a large number of regulatory documents which might frequently change. Hence, this work examines how legal and domain experts can be assisted in the assessment of relevant requirements. For this, we compare an embedding-based NLP ranking method, a generative AI method using GPT-4, and a crowdsourced method with the purely manual method of creating relevancy labels by experts. The proposed methods are evaluated based on two case studies: an Australian insurance case created with domain experts and a global banking use case, adapted from SAP Signavio's workflow example of an international guideline. A gold standard is created for both BPMN2.0 processes and matched to real-world textual requirements from multiple regulatory documents. The evaluation and discussion provide insights into strengths and weaknesses of each method regarding applicability, automation, transparency, and reproducibility and provide guidelines on which method combinations will maximize benefits for given characteristics such as process usage, impact, and dynamics of an application scenario.
- [458] arXiv:2401.02987 (cross-list from cs.CL) [ pdf , other ]
-
Title: Has Your Pretrained Model Improved? A Multi-head Posterior Based ApproachAuthors: Prince Aboagye , Yan Zheng , Junpeng Wang , Uday Singh Saini , Xin Dai , Michael Yeh , Yujie Fan , Zhongfang Zhuang , Shubham Jain , Liang Wang , Wei ZhangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The emergence of pre-trained models has significantly impacted Natural Language Processing (NLP) and Computer Vision to relational datasets. Traditionally, these models are assessed through fine-tuned downstream tasks. However, this raises the question of how to evaluate these models more efficiently and more effectively. In this study, we explore a novel approach where we leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models. We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models. Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
- [459] arXiv:2401.02991 (cross-list from cs.CL) [ pdf , other ]
-
Title: GLIDE-RL: Grounded Language Instruction through DEmonstration in RLAuthors: Chaitanya Kharyal , Sai Krishna Gottipati , Tanmay Kumar Sinha , Srijita Das , Matthew E. TaylorComments: 12 pages, 6 figures, to be presented at AAMAS 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: One of the final frontiers in the development of complex human - AI collaborative systems is the ability of AI agents to comprehend the natural language and perform tasks accordingly. However, training efficient Reinforcement Learning (RL) agents grounded in natural language has been a long-standing challenge due to the complexity and ambiguity of the language and sparsity of the rewards, among other factors. Several advances in reinforcement learning, curriculum learning, continual learning, language models have independently contributed to effective training of grounded agents in various environments. Leveraging these developments, we present a novel algorithm, Grounded Language Instruction through DEmonstration in RL (GLIDE-RL) that introduces a teacher-instructor-student curriculum learning framework for training an RL agent capable of following natural language instructions that can generalize to previously unseen language instructions. In this multi-agent framework, the teacher and the student agents learn simultaneously based on the student's current skill level. We further demonstrate the necessity for training the student agent with not just one, but multiple teacher agents. Experiments on a complex sparse reward environment validates the effectiveness of our proposed approach.
- [460] arXiv:2401.02992 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Advanced Unstructured Data Processing for ESG Reports: A Methodology for Structured Transformation and Enhanced AnalysisAuthors: Jiahui Peng , Jing Gao , Xin Tong , Jing Guo , Hang Yang , Jianchuan Qi , Ruiqiao Li , Nan Li , Ming XuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In the evolving field of corporate sustainability, analyzing unstructured Environmental, Social, and Governance (ESG) reports is a complex challenge due to their varied formats and intricate content. This study introduces an innovative methodology utilizing the "Unstructured Core Library", specifically tailored to address these challenges by transforming ESG reports into structured, analyzable formats. Our approach significantly advances the existing research by offering high-precision text cleaning, adept identification and extraction of text from images, and standardization of tables within these reports. Emphasizing its capability to handle diverse data types, including text, images, and tables, the method adeptly manages the nuances of differing page layouts and report styles across industries. This research marks a substantial contribution to the fields of industrial ecology and corporate sustainability assessment, paving the way for the application of advanced NLP technologies and large language models in the analysis of corporate governance and sustainability. Our code is available at this https URL .
- [461] arXiv:2401.02993 (cross-list from cs.CL) [ pdf , other ]
-
Title: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation FusionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Retrieval-based augmentations that aim to incorporate knowledge from an external database into language models have achieved great success in various knowledge-intensive (KI) tasks, such as question-answering and text generation. However, integrating retrievals in non-knowledge-intensive (NKI) tasks, such as text classification, is still challenging. Existing works focus on concatenating retrievals to inputs as context to form the prompt-based inputs. Unfortunately, such methods require language models to have the capability to handle long texts. Besides, inferring such concatenated data would also consume a significant amount of computational resources.
To solve these challenges, we propose \textbf{ReFusion} in this paper, a computation-efficient \textbf{Re}trieval representation \textbf{Fusion} with neural architecture search. The main idea is to directly fuse the retrieval representations into the language models. Specifically, we first propose an online retrieval module that retrieves representations of similar sentences. Then, we present a retrieval fusion module including two effective ranking schemes, i.e., reranker-based scheme and ordered-mask-based scheme, to fuse the retrieval representations with hidden states. Furthermore, we use Neural Architecture Search (NAS) to seek the optimal fusion structure across different layers. Finally, we conduct comprehensive experiments, and the results demonstrate our ReFusion can achieve superior and robust performance on various NKI tasks. - [462] arXiv:2401.02994 (cross-list from cs.CL) [ pdf , other ]
-
Title: Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLMAuthors: Xiaoding Lu , Zongyi Liu , Adian Liusie , Vyas Raina , Vineet Mudupalli , Yuwen Zhang , William BeauchampSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In conversational AI research, there's a noticeable trend towards developing models with a larger number of parameters, exemplified by models like ChatGPT. While these expansive models tend to generate increasingly better chat responses, they demand significant computational resources and memory. This study explores a pertinent question: Can a combination of smaller models collaboratively achieve comparable or enhanced performance relative to a singular large model? We introduce an approach termed "blending", a straightforward yet effective method of integrating multiple chat AIs. Our empirical evidence suggests that when specific smaller models are synergistically blended, they can potentially outperform or match the capabilities of much larger counterparts. For instance, integrating just three models of moderate size (6B/13B paramaeters) can rival or even surpass the performance metrics of a substantially larger model like ChatGPT (175B+ paramaters). This hypothesis is rigorously tested using A/B testing methodologies with a large user base on the Chai research platform over a span of thirty days. The findings underscore the potential of the "blending" strategy as a viable approach for enhancing chat AI efficacy without a corresponding surge in computational demands.
- [463] arXiv:2401.02995 (cross-list from cs.CL) [ pdf , other ]
-
Title: CANAMRF: An Attention-Based Model for Multimodal Depression DetectionComments: 6 pages, 3 figures. Pacific Rim International Conference on Artificial Intelligence. Singapore: Springer Nature Singapore, 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract: Multimodal depression detection is an important research topic that aims to predict human mental states using multimodal data. Previous methods treat different modalities equally and fuse each modality by naïve mathematical operations without measuring the relative importance between them, which cannot obtain well-performed multimodal representations for downstream depression tasks. In order to tackle the aforementioned concern, we present a Cross-modal Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for multimodal depression detection. CANAMRF is constructed by a multimodal feature extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid Attention Module. Through experimentation on two benchmark datasets, CANAMRF demonstrates state-of-the-art performance, underscoring the effectiveness of our proposed approach.
- [464] arXiv:2401.02997 (cross-list from cs.CL) [ pdf , other ]
-
Title: Blar-SQL: Faster, Stronger, Smaller NL2SQLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have gained considerable notoriety in the field of natural language to SQL tasks (NL2SQL). In this study, we show how task decomposition can greatly benefit LLMs in database understanding and query generation in order to answer human questions with an SQL query.
We fined-tuned open source models, specifically Llama-2 and Code Llama, by combining 2 different models each designated to focus on one of two tasks in order to leverage each model's core competency to further increase the accuracy of the final SQL query.
We propose a new framework to divide the schema into chunks in order to fit more information into a limited context. Our results are comparable with those obtained by GPT-4 at the same time being 135 times smaller, 90 times faster and more than 100 times cheaper than GPT-4. - [465] arXiv:2401.03000 (cross-list from cs.SD) [ pdf , other ]
-
Title: Bridging Modalities: Knowledge Distillation and Masked Training for Translating Multi-Modal Emotion Recognition to Uni-Modal, Speech-Only Emotion RecognitionSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: This paper presents an innovative approach to address the challenges of translating multi-modal emotion recognition models to a more practical and resource-efficient uni-modal counterpart, specifically focusing on speech-only emotion recognition. Recognizing emotions from speech signals is a critical task with applications in human-computer interaction, affective computing, and mental health assessment. However, existing state-of-the-art models often rely on multi-modal inputs, incorporating information from multiple sources such as facial expressions and gestures, which may not be readily available or feasible in real-world scenarios. To tackle this issue, we propose a novel framework that leverages knowledge distillation and masked training techniques.
- [466] arXiv:2401.03006 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: The Rise of Diffusion Models in Time-Series ForecastingComments: Version 2, 24 pages, 10 figures, 12 tables, For complete LuaTeX source: this https URL , Written by: Caspar Meijer, Supervised by: Lydia Y. ChenSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This survey delves into the application of diffusion models in time-series forecasting. Diffusion models are demonstrating state-of-the-art results in various fields of generative AI. The paper includes comprehensive background information on diffusion models, detailing their conditioning methods and reviewing their use in time-series forecasting. The analysis covers 11 specific time-series implementations, the intuition and theory behind them, the effectiveness on different datasets, and a comparison among each other. Key contributions of this work are the thorough exploration of diffusion models' applications in time-series forecasting and a chronologically ordered overview of these models. Additionally, the paper offers an insightful discussion on the current state-of-the-art in this domain and outlines potential future research directions. This serves as a valuable resource for researchers in AI and time-series analysis, offering a clear view of the latest advancements and future potential of diffusion models.
- [467] arXiv:2401.03040 (cross-list from cs.LG) [ pdf , other ]
-
Title: AccidentGPT: Large Multi-Modal Foundation Model for Traffic Accident AnalysisComments: 8 pages, 2 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Traffic accident analysis is pivotal for enhancing public safety and developing road regulations. Traditional approaches, although widely used, are often constrained by manual analysis processes, subjective decisions, uni-modal outputs, as well as privacy issues related to sensitive data. This paper introduces the idea of AccidentGPT, a foundation model of traffic accident analysis, which incorporates multi-modal input data to automatically reconstruct the accident process video with dynamics details, and furthermore provide multi-task analysis with multi-modal outputs. The design of the AccidentGPT is empowered with a multi-modality prompt with feedback for task-oriented adaptability, a hybrid training schema to leverage labelled and unlabelled data, and a edge-cloud split configuration for data privacy. To fully realize the functionalities of this model, we proposes several research opportunities. This paper serves as the stepping stone to fill the gaps in traditional approaches of traffic accident analysis and attract the research community attention for automatic, objective, and privacy-preserving traffic accident analysis.
- [468] arXiv:2401.03059 (cross-list from cs.LG) [ pdf , other ]
-
Title: Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit ApproachComments: To be published in the proceedings of the 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Abstract: Ultra-reliable low-latency communication (URLLC) is the cornerstone for a broad range of emerging services in next-generation wireless networks. URLLC fundamentally relies on the network's ability to proactively determine whether sufficient resources are available to support the URLLC traffic, and thus, prevent so-called cell overloads. Nonetheless, achieving accurate quality-of-service (QoS) predictions for URLLC user equipment (UEs) and preventing cell overloads are very challenging tasks. This is due to dependency of the QoS metrics (latency and reliability) on traffic and channel statistics, users' mobility, and interdependent performance across UEs. In this paper, a new QoS-aware UE admission control approach is developed to proactively estimate QoS for URLLC UEs, prior to associating them with a cell, and accordingly, admit only a subset of UEs that do not lead to a cell overload. To this end, an optimization problem is formulated to find an efficient UE admission control policy, cognizant of UEs' QoS requirements and cell-level load dynamics. To solve this problem, a new machine learning based method is proposed that builds on (deep) neural contextual bandits, a suitable framework for dealing with nonlinear bandit problems. In fact, the UE admission controller is treated as a bandit agent that observes a set of network measurements (context) and makes admission control decisions based on context-dependent QoS (reward) predictions. The simulation results show that the proposed scheme can achieve near-optimal performance and yield substantial gains in terms of cell-level service reliability and efficient resource utilization.
- [469] arXiv:2401.03065 (cross-list from cs.SE) [ pdf , other ]
-
Title: CRUXEval: A Benchmark for Code Reasoning, Understanding and ExecutionAuthors: Alex Gu , Baptiste Rozière , Hugh Leather , Armando Solar-Lezama , Gabriel Synnaeve , Sida I. WangComments: 71 pages, 29 figuresSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an input-output pair, leading to two natural tasks: input prediction and output prediction. First, we propose a generic recipe for generating our execution benchmark which can be used to create future variation of the benchmark. Second, we evaluate twenty code models on our benchmark and discover that many recent high-scoring models on HumanEval do not show the same improvements on our benchmark. Third, we show that simple CoT and fine-tuning schemes can improve performance on our benchmark but remain far from solving it. The best setup, GPT-4 with chain of thought (CoT), achieves a pass@1 of 75% and 81% on input and output prediction, respectively. In contrast, Code Llama 34B achieves a pass@1 of 50% and 46% on input and output prediction, highlighting the gap between open and closed source models. As no model is close to acing CRUXEval, we provide examples of consistent GPT-4 failures on simple programs as a lens into its code reasoning capabilities and areas for improvement.
- [470] arXiv:2401.03131 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Physics-guided Generative AI Toolkit for Geophysical MonitoringSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Signal Processing (eess.SP); Geophysics (physics.geo-ph)
Abstract: Full-waveform inversion (FWI) plays a vital role in geoscience to explore the subsurface. It utilizes the seismic wave to image the subsurface velocity map. As the machine learning (ML) technique evolves, the data-driven approaches using ML for FWI tasks have emerged, offering enhanced accuracy and reduced computational cost compared to traditional physics-based methods. However, a common challenge in geoscience, the unprivileged data, severely limits ML effectiveness. The issue becomes even worse during model pruning, a step essential in geoscience due to environmental complexities. To tackle this, we introduce the EdGeo toolkit, which employs a diffusion-based model guided by physics principles to generate high-fidelity velocity maps. The toolkit uses the acoustic wave equation to generate corresponding seismic waveform data, facilitating the fine-tuning of pruned ML models. Our results demonstrate significant improvements in SSIM scores and reduction in both MAE and MSE across various pruning ratios. Notably, the ML model fine-tuned using data generated by EdGeo yields superior quality of velocity maps, especially in representing unprivileged features, outperforming other existing methods.
- [471] arXiv:2401.03134 (cross-list from cs.LG) [ pdf , other ]
-
Title: TimeGraphs: Graph-based Temporal ReasoningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Many real-world systems exhibit temporal, dynamic behaviors, which are captured as time series of complex agent interactions. To perform temporal reasoning, current methods primarily encode temporal dynamics through simple sequence-based models. However, in general these models fail to efficiently capture the full spectrum of rich dynamics in the input, since the dynamics is not uniformly distributed. In particular, relevant information might be harder to extract and computing power is wasted for processing all individual timesteps, even if they contain no significant changes or no new information. Here we propose TimeGraphs, a novel approach that characterizes dynamic interactions as a hierarchical temporal graph, diverging from traditional sequential representations. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. Adopting a self-supervised method, TimeGraphs constructs a multi-level event hierarchy from a temporal input, which is then used to efficiently reason about the unevenly distributed dynamics. This construction process is scalable and incremental to accommodate streaming data. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset. The results demonstrate both robustness and efficiency of TimeGraphs on a range of temporal reasoning tasks. Our approach obtains state-of-the-art performance and leads to a performance increase of up to 12.2% on event prediction and recognition tasks over current approaches. Our experiments further demonstrate a wide array of capabilities including zero-shot generalization, robustness in case of data sparsity, and adaptability to streaming data flow.
- [472] arXiv:2401.03137 (cross-list from cs.LG) [ pdf , other ]
-
Title: SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement LearningComments: Published as a conference paper at NeurIPS 23Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data. In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective. By introducing a novel regularization loss for Q-ensemble independence based on random matrix theory, we propose spiked Wishart Q-ensemble independence regularization (SPQR) for reinforcement learning. Specifically, we modify the intractable hypothesis testing criterion for the Q-ensemble independence into a tractable KL divergence between the spectral distribution of the Q-ensemble and the target Wigner's semicircle distribution. We implement SPQR in several online and offline ensemble Q-learning algorithms. In the experiments, SPQR outperforms the baseline algorithms in both online and offline RL benchmarks.
- [473] arXiv:2401.03138 (cross-list from cs.LG) [ pdf , other ]
-
Title: TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph ModelingComments: 7 pages, 7 figures, 4 tables. Accepted by AAAI-24-IAAI, to appearSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: To address the limitations of traffic prediction from location-bound detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data source that leverages the extensive coverage of cellular traffic to capture mobility patterns. Our extensive analysis validates its potential for transportation. Focusing on vehicle-related GCT flow prediction, we propose a graph neural network that integrates multivariate, temporal, and spatial facets for improved accuracy. Experiments reveal our model's superiority over baselines, especially in long-term predictions. We also highlight the potential for GCT flow integration into transportation systems.
- [474] arXiv:2401.03154 (cross-list from cs.RO) [ pdf , other ]
-
Title: Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber AgentsComments: Under reviewSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: Multi-agent multi-target tracking has a wide range of applications, including wildlife patrolling, security surveillance or environment monitoring. Such algorithms often make restrictive assumptions: the number of targets and/or their initial locations may be assumed known, or agents may be pre-assigned to monitor disjoint partitions of the environment, reducing the burden of exploration. This also limits applicability when there are fewer agents than targets, since agents are unable to continuously follow the targets in their fields of view. Multi-agent tracking algorithms additionally assume inter-agent synchronization of observations, or the presence of a central controller to coordinate joint actions. Instead, we focus on the setting of decentralized multi-agent, multi-target, simultaneous active search-and-tracking with asynchronous inter-agent communication. Our proposed algorithm DecSTER uses a sequential monte carlo implementation of the probability hypothesis density filter for posterior inference combined with Thompson sampling for decentralized multi-agent decision making. We compare different action selection policies, focusing on scenarios where targets outnumber agents. In simulation, we demonstrate that DecSTER is robust to unreliable inter-agent communication and outperforms information-greedy baselines in terms of the Optimal Sub-Pattern Assignment (OSPA) metric for different numbers of targets and varying teamsizes.
- [475] arXiv:2401.03158 (cross-list from cs.CL) [ pdf , other ]
-
Title: Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text ClassificationAuthors: Hui Wu , Yuanben Zhang , Zhonghe Han , Yingyan Hou , Lei Wang , Siye Liu , Qihang Gong , Yunping GeSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Short Text Classification (STC) is crucial for processing and comprehending the brief but substantial content prevalent on contemporary digital platforms. The STC encounters difficulties in grasping semantic and syntactic intricacies, an issue that is apparent in traditional pre-trained language models. Although Graph Convolutional Networks enhance performance by integrating external knowledge bases, these methods are limited by the quality and extent of the knowledge applied. Recently, the emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks. However, some studies have highlighted the limitations of their application in fundamental NLP tasks. Consequently, this study sought to employ CoT to investigate the capabilities of LLMs in STC tasks. This study introduces Quartet Logic: A Four-Step Reasoning (QLFR) framework. This framework primarily incorporates Syntactic and Semantic Enrichment CoT, effectively decomposing the STC task into four distinct steps: (i) essential concept identification, (ii) common-sense knowledge retrieval, (iii) text rewriting, and (iv) classification. This elicits the inherent knowledge and abilities of LLMs to address the challenges in STC. Surprisingly, we found that QLFR can also improve the performance of smaller models. Therefore, we developed a CoT-Driven Multi-task learning (QLFR-CML) method to facilitate the knowledge transfer from LLMs to smaller models. Extensive experimentation across six short-text benchmarks validated the efficacy of the proposed methods. Notably, QLFR achieved state-of-the-art performance on all datasets, with significant improvements, particularly on the Ohsumed and TagMyNews datasets.
- [476] arXiv:2401.03160 (cross-list from cs.LG) [ pdf , other ]
-
Title: Human as AI Mentor: Enhanced Human-in-the-loop Reinforcement Learning for Safe and Efficient Autonomous DrivingComments: Accepted by Communications in Transportation ResearchSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents' policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor's cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios. The code and demo videos for this paper can be accessed at: this https URL
- [477] arXiv:2401.03167 (cross-list from cs.CV) [ pdf , other ]
-
Title: PosDiffNet: Positional Neural Diffusion for Point Cloud Registration in a Large Field of View with PerturbationsAuthors: Rui She , Sijie Wang , Qiyu Kang , Kai Zhao , Yang Song , Wee Peng Tay , Tianyu Geng , Xingchao JianJournal-ref: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2024), Vancouver, Canada, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Point cloud registration is a crucial technique in 3D computer vision with a wide range of applications. However, this task can be challenging, particularly in large fields of view with dynamic objects, environmental noise, or other perturbations. To address this challenge, we propose a model called PosDiffNet. Our approach performs hierarchical registration based on window-level, patch-level, and point-level correspondence. We leverage a graph neural partial differential equation (PDE) based on Beltrami flow to obtain high-dimensional features and position embeddings for point clouds. We incorporate position embeddings into a Transformer module based on a neural ordinary differential equation (ODE) to efficiently represent patches within points. We employ the multi-level correspondence derived from the high feature similarity scores to facilitate alignment between point clouds. Subsequently, we use registration methods such as SVD-based algorithms to predict the transformation using corresponding point pairs. We evaluate PosDiffNet on several 3D point cloud datasets, verifying that it achieves state-of-the-art (SOTA) performance for point cloud registration in large fields of view with perturbations. The implementation code of experiments is available at this https URL .
- [478] arXiv:2401.03171 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Exploration of Adolescent Depression Risk Prediction Based on Census Surveys and General Life IssuesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: In contemporary society, the escalating pressures of life and work have propelled psychological disorders to the forefront of modern health concerns, an issue that has been further accentuated by the COVID-19 pandemic. The prevalence of depression among adolescents is steadily increasing, and traditional diagnostic methods, which rely on scales or interviews, prove particularly inadequate for detecting depression in young people. Addressing these challenges, numerous AI-based methods for assisting in the diagnosis of mental health issues have emerged. However, most of these methods center around fundamental issues with scales or use multimodal approaches like facial expression recognition. Diagnosis of depression risk based on everyday habits and behaviors has been limited to small-scale qualitative studies. Our research leverages adolescent census data to predict depression risk, focusing on children's experiences with depression and their daily life situations. We introduced a method for managing severely imbalanced high-dimensional data and an adaptive predictive approach tailored to data structure characteristics. Furthermore, we proposed a cloud-based architecture for automatic online learning and data updates. This study utilized publicly available NSCH youth census data from 2020 to 2022, encompassing nearly 150,000 data entries. We conducted basic data analyses and predictive experiments, demonstrating significant performance improvements over standard machine learning and deep learning algorithms. This affirmed our data processing method's broad applicability in handling imbalanced medical data. Diverging from typical predictive method research, our study presents a comprehensive architectural solution, considering a wider array of user needs.
- [479] arXiv:2401.03175 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Part-of-Speech Tagger for Bodo Language using Deep Learning approachComments: Accepted to Natural Language EngineeringSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Language Processing systems such as Part-of-speech tagging, Named entity recognition, Machine translation, Speech recognition, and Language modeling (LM) are well-studied in high-resource languages. Nevertheless, research on these systems for several low-resource languages, including Bodo, Mizo, Nagamese, and others, is either yet to commence or is in its nascent stages. Language model plays a vital role in the downstream tasks of modern NLP. Extensive studies are carried out on LMs for high-resource languages. Nevertheless, languages such as Bodo, Rabha, and Mising continue to lack coverage. In this study, we first present BodoBERT, a language model for the Bodo language. To the best of our knowledge, this work is the first such effort to develop a language model for Bodo. Secondly, we present an ensemble DL-based POS tagging model for Bodo. The POS tagging model is based on combinations of BiLSTM with CRF and stacked embedding of BodoBERT with BytePairEmbeddings. We cover several language models in the experiment to see how well they work in POS tagging tasks. The best-performing model achieves an F1 score of 0.8041. A comparative experiment was also conducted on Assamese POS taggers, considering that the language is spoken in the same region as Bodo.
- [480] arXiv:2401.03190 (cross-list from cs.CL) [ pdf , other ]
-
Title: MPN: Leveraging Multilingual Patch Neuron for Cross-lingual Model EditingComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Large language models are known for encoding a vast amount of factual knowledge, but they often becomes outdated due to the ever-changing nature of external information. A promising solution to this challenge is the utilization of model editing methods to update the knowledge in an efficient manner. However, the majority of existing model editing techniques are limited to monolingual frameworks, thus failing to address the crucial issue of cross-lingual knowledge synchronization for multilingual models. To tackle this problem, we propose a simple yet effective method that trains multilingual patch neuron to store cross-lingual knowledge. It can be easily adapted to existing approaches to enhance their cross-lingual editing capabilities. To evaluate our method, we conduct experiments using both the XNLI dataset and a self-constructed XFEVER dataset. Experimental results demonstrate that our proposed method achieves improved performance in cross-lingual editing tasks without requiring excessive modifications to the original methodology, thereby showcasing its user-friendly characteristics. Codes will be released soon.
- [481] arXiv:2401.03196 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: SecureReg: A Combined Framework for Proactively Exposing Malicious Domain Name RegistrationsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Rising cyber threats, with miscreants registering thousands of new domains daily for Internet-scale attacks like spam, phishing, and drive-by downloads, emphasize the need for innovative detection methods. This paper introduces a cutting-edge approach for identifying suspicious domains at the onset of the registration process. The accompanying data pipeline generates crucial features by comparing new domains to registered domains,emphasizing the crucial similarity score. Leveraging a novel combination of Natural Language Processing (NLP) techniques, including a pretrained Canine model, and Multilayer Perceptron (MLP) models, our system analyzes semantic and numerical attributes, providing a robust solution for early threat detection. This integrated approach significantly reduces the window of vulnerability, fortifying defenses against potential threats. The findings demonstrate the effectiveness of the integrated approach and contribute to the ongoing efforts in developing proactive strategies to mitigate the risks associated with illicit online activities through the early identification of suspicious domain registrations.
- [482] arXiv:2401.03214 (cross-list from cs.LG) [ pdf , other ]
-
Title: Understanding Representation Learnability of Nonlinear Self-Supervised LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Self-supervised learning (SSL) has empirically shown its data representation learnability in many downstream tasks. There are only a few theoretical works on data representation learnability, and many of those focus on final data representation, treating the nonlinear neural network as a ``black box". However, the accurate learning results of neural networks are crucial for describing the data distribution features learned by SSL models. Our paper is the first to analyze the learning results of the nonlinear SSL model accurately. We consider a toy data distribution that contains two features: the label-related feature and the hidden feature. Unlike previous linear setting work that depends on closed-form solutions, we use the gradient descent algorithm to train a 1-layer nonlinear SSL model with a certain initialization region and prove that the model converges to a local minimum. Furthermore, different from the complex iterative analysis, we propose a new analysis process which uses the exact version of Inverse Function Theorem to accurately describe the features learned by the local minimum. With this local minimum, we prove that the nonlinear SSL model can capture the label-related feature and hidden feature at the same time. In contrast, the nonlinear supervised learning (SL) model can only learn the label-related feature. We also present the learning processes and results of the nonlinear SSL and SL model via simulation experiments.
- [483] arXiv:2401.03221 (cross-list from cs.CV) [ pdf , other ]
-
Title: MirrorDiffusion: Stabilizing Diffusion Process in Zero-shot Image Translation by Prompts Redescription and BeyondComments: A prompt re-description strategy is proposed for stabilizing the diffusion model in image-to-image translation. Code and dataset page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recently, text-to-image diffusion models become a new paradigm in image processing fields, including content generation, image restoration and image-to-image translation. Given a target prompt, Denoising Diffusion Probabilistic Models (DDPM) are able to generate realistic yet eligible images. With this appealing property, the image translation task has the potential to be free from target image samples for supervision. By using a target text prompt for domain adaption, the diffusion model is able to implement zero-shot image-to-image translation advantageously. However, the sampling and inversion processes of DDPM are stochastic, and thus the inversion process often fail to reconstruct the input content. Specifically, the displacement effect will gradually accumulated during the diffusion and inversion processes, which led to the reconstructed results deviating from the source domain. To make reconstruction explicit, we propose a prompt redescription strategy to realize a mirror effect between the source and reconstructed image in the diffusion model (MirrorDiffusion). More specifically, a prompt redescription mechanism is investigated to align the text prompts with latent code at each time step of the Denoising Diffusion Implicit Models (DDIM) inversion to pursue a structure-preserving reconstruction. With the revised DDIM inversion, MirrorDiffusion is able to realize accurate zero-shot image translation by editing optimized text prompts and latent code. Extensive experiments demonstrate that MirrorDiffusion achieves superior performance over the state-of-the-art methods on zero-shot image translation benchmarks by clear margins and practical model stability.
- [484] arXiv:2401.03233 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Convergence Rate Maximization for Split Learning-based Control of EMG Prosthetic DevicesAuthors: Matea Marinova , Daniel Denkovski , Hristijan Gjoreski , Zoran Hadzi-Velkov , Valentin RakovicComments: 8 pages, 7 figures, corrected typosSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Split Learning (SL) is a promising Distributed Learning approach in electromyography (EMG) based prosthetic control, due to its applicability within resource-constrained environments. Other learning approaches, such as Deep Learning and Federated Learning (FL), provide suboptimal solutions, since prosthetic devices are extremely limited in terms of processing power and battery life. The viability of implementing SL in such scenarios is caused by its inherent model partitioning, with clients executing the smaller model segment. However, selecting an inadequate cut layer hinders the training process in SL systems. This paper presents an algorithm for optimal cut layer selection in terms of maximizing the convergence rate of the model. The performance evaluation demonstrates that the proposed algorithm substantially accelerates the convergence in an EMG pattern recognition task for improving prosthetic device control.
- [485] arXiv:2401.03238 (cross-list from cs.HC) [ pdf , other ]
-
Title: Using Large Language Models to Assess Tutors' Performance in Reacting to Students Making Math ErrorsComments: 13 page Workshop paper, AAAI2024 Workshop on AI for Education - Bridging Innovation and Responsibility, Tutoring, Tutor evaluation, Real-time feedback, Math learning, LLMs, GPT-4Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Research suggests that tutors should adopt a strategic approach when addressing math errors made by low-efficacy students. Rather than drawing direct attention to the error, tutors should guide the students to identify and correct their mistakes on their own. While tutor lessons have introduced this pedagogical skill, human evaluation of tutors applying this strategy is arduous and time-consuming. Large language models (LLMs) show promise in providing real-time assessment to tutors during their actual tutoring sessions, yet little is known regarding their accuracy in this context. In this study, we investigate the capacity of generative AI to evaluate real-life tutors' performance in responding to students making math errors. By analyzing 50 real-life tutoring dialogues, we find both GPT-3.5-Turbo and GPT-4 demonstrate proficiency in assessing the criteria related to reacting to students making errors. However, both models exhibit limitations in recognizing instances where the student made an error. Notably, GPT-4 tends to overidentify instances of students making errors, often attributing student uncertainty or inferring potential errors where human evaluators did not. Future work will focus on enhancing generalizability by assessing a larger dataset of dialogues and evaluating learning transfer. Specifically, we will analyze the performance of tutors in real-life scenarios when responding to students' math errors before and after lesson completion on this crucial tutoring skill.
- [486] arXiv:2401.03246 (cross-list from cs.LG) [ pdf , other ]
-
Title: SeqNAS: Neural Architecture Search for Event Sequence ClassificationAuthors: Igor Udovichenko , Egor Shvetsov , Denis Divitsky , Dmitry Osin , Ilya Trofimov , Anatoly Glushenko , Ivan Sukharev , Dmitry Berestenev , Evgeny BurnaevComments: in IEEE AccessSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Neural Architecture Search (NAS) methods are widely used in various industries to obtain high quality taskspecific solutions with minimal human intervention. Event Sequences find widespread use in various industrial applications including churn prediction customer segmentation fraud detection and fault diagnosis among others. Such data consist of categorical and real-valued components with irregular timestamps. Despite the usefulness of NAS methods previous approaches only have been applied to other domains images texts or time series. Our work addresses this limitation by introducing a novel NAS algorithm SeqNAS specifically designed for event sequence classification. We develop a simple yet expressive search space that leverages commonly used building blocks for event sequence classification including multihead self attention convolutions and recurrent cells. To perform the search we adopt sequential Bayesian Optimization and utilize previously trained models as an ensemble of teachers to augment knowledge distillation. As a result of our work we demonstrate that our method surpasses state of the art NAS methods and popular architectures suitable for sequence classification and holds great potential for various industrial applications.
- [487] arXiv:2401.03267 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Autonomous Navigation in Complex EnvironmentsComments: 7 pages, 3 figures, independent paperSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: This paper explores the application of CNN-DNN network fusion to construct a robot navigation controller within a simulated environment. The simulated environment is constructed to model a subterranean rescue situation, such that an autonomous agent is tasked with finding a goal within an unknown cavernous system. Imitation learning is used to train the control algorithm to use LiDAR and camera data to navigate the space and find the goal. The trained model is then tested for robustness using Monte-Carlo.
- [488] arXiv:2401.03275 (cross-list from cs.CV) [ pdf , other ]
-
Title: Real Time Human Detection by Unmanned Aerial VehiclesComments: 6 pages, 5 figures, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: One of the most important problems in computer vision and remote sensing is object detection, which identifies particular categories of diverse things in pictures. Two crucial data sources for public security are the thermal infrared (TIR) remote sensing multi-scenario photos and videos produced by unmanned aerial vehicles (UAVs). Due to the small scale of the target, complex scene information, low resolution relative to the viewable videos, and dearth of publicly available labeled datasets and training models, their object detection procedure is still difficult. A UAV TIR object detection framework for pictures and videos is suggested in this study. The Forward-looking Infrared (FLIR) cameras used to gather ground-based TIR photos and videos are used to create the ``You Only Look Once'' (YOLO) model, which is based on CNN architecture. Results indicated that in the validating task, detecting human object had an average precision at IOU (Intersection over Union) = 0.5, which was 72.5\%, using YOLOv7 (YOLO version 7) state of the art model \cite{1}, while the detection speed around 161 frames per second (FPS/second). The usefulness of the YOLO architecture is demonstrated in the application, which evaluates the cross-detection performance of people in UAV TIR videos under a YOLOv7 model in terms of the various UAVs' observation angles. The qualitative and quantitative evaluation of object detection from TIR pictures and videos using deep-learning models is supported favorably by this work.
- [489] arXiv:2401.03301 (cross-list from cs.LG) [ pdf , other ]
-
Title: On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and BeyondComments: NeurIPS'23; Arxiv is the authors' preferred version; v2: add a missing related workSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We seek to understand what facilitates sample-efficient learning from historical datasets for sequential decision-making, a problem that is popularly known as offline reinforcement learning (RL). Further, we are interested in algorithms that enjoy sample efficiency while leveraging (value) function approximation. In this paper, we address these fundamental questions by (i) proposing a notion of data diversity that subsumes the previous notions of coverage measures in offline RL and (ii) using this notion to {unify} three distinct classes of offline RL algorithms based on version spaces (VS), regularized optimization (RO), and posterior sampling (PS). We establish that VS-based, RO-based, and PS-based algorithms, under standard assumptions, achieve \emph{comparable} sample efficiency, which recovers the state-of-the-art sub-optimality bounds for finite and linear model classes with the standard assumptions. This result is surprising, given that the prior work suggested an unfavorable sample complexity of the RO-based algorithm compared to the VS-based algorithm, whereas posterior sampling is rarely considered in offline RL due to its explorative nature. Notably, our proposed model-free PS-based algorithm for offline RL is {novel}, with sub-optimality bounds that are {frequentist} (i.e., worst-case) in nature.
- [490] arXiv:2401.03306 (cross-list from cs.LG) [ pdf , other ]
-
Title: MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot LearningAuthors: Rafael Rafailov , Kyle Hatch , Victor Kolev , John D. Martin , Mariano Phielipp , Chelsea FinnComments: This is an updated version of a manuscript that originally appeared at CoRL 2023. The project website is here this https URLJournal-ref: Proceedings of The 7th Conference on Robot Learning, PMLR 229:3654-3671, 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations in the context of realistic robot tasks. Recent offline model-free approaches successfully use online fine-tuning to either improve the performance of the agent over the data collection policy or adapt to novel tasks. At the same time, model-based RL algorithms have achieved significant progress in sample efficiency and the complexity of the tasks they can solve, yet remain under-utilized in the fine-tuning setting. In this work, we argue that existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains due to issues with distribution shifts, off-dynamics data, and non-stationary rewards. We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization, while preventing model exploitation by controlling epistemic uncertainty. We find that our approach successfully solves tasks from the MetaWorld benchmark, as well as the Franka Kitchen robot manipulation environment completely from images. To the best of our knowledge, MOTO is the first method to solve this environment from pixels.
- [491] arXiv:2401.03310 (cross-list from cs.NI) [ pdf , other ]
-
Title: CAVIAR: Co-simulation of 6G Communications, 3D Scenarios and AI for Digital TwinsSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Digital twins are an important technology for advancing mobile communications, specially in use cases that require simultaneously simulating the wireless channel, 3D scenes and machine learning. Aiming at providing a solution to this demand, this work describes a modular co-simulation methodology called CAVIAR. Here, CAVIAR is upgraded to support a message passing library and enable the virtual counterpart of a digital twin system using different 6G-related simulators. The main contributions of this work are the detailed description of different CAVIAR architectures, the implementation of this methodology to assess a 6G use case of UAV-based search and rescue mission (SAR), and the generation of benchmarking data about the computational resource usage. For executing the SAR co-simulation we adopt five open-source solutions: the physical and link level network simulator Sionna, the simulator for autonomous vehicles AirSim, scikit-learn for training a decision tree for MIMO beam selection, Yolov8 for the detection of rescue targets and NATS for message passing. Results for the implemented SAR use case suggest that the methodology can run in a single machine, with the main demanded resources being the CPU processing and the GPU memory.
- [492] arXiv:2401.03312 (cross-list from cs.CV) [ pdf , other ]
-
Title: Exploiting Data Hierarchy as a New Modality for Contrastive LearningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This work investigates how hierarchically structured data can help neural networks learn conceptual representations of cathedrals. The underlying WikiScenes dataset provides a spatially organized hierarchical structure of cathedral components. We propose a novel hierarchical contrastive training approach that leverages a triplet margin loss to represent the data's spatial hierarchy in the encoder's latent space. As such, the proposed approach investigates if the dataset structure provides valuable information for self-supervised learning. We apply t-SNE to visualize the resultant latent space and evaluate the proposed approach by comparing it with other dataset-specific contrastive learning methods using a common downstream classification task. The proposed method outperforms the comparable weakly-supervised and baseline methods. Our findings suggest that dataset structure is a valuable modality for weakly-supervised learning.
- [493] arXiv:2401.03314 (cross-list from cs.CL) [ pdf , other ]
-
Title: Enhancing Context Through ContrastAuthors: Kshitij Ambilduke , Aneesh Shetye , Diksha Bagade , Rishika Bhagwatkar , Khurshed Fitter , Prasad Vagdargi , Shital ChiddarwarSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Neural machine translation benefits from semantically rich representations. Considerable progress in learning such representations has been achieved by language modelling and mutual information maximization objectives using contrastive learning. The language-dependent nature of language modelling introduces a trade-off between the universality of the learned representations and the model's performance on the language modelling tasks. Although contrastive learning improves performance, its success cannot be attributed to mutual information alone. We propose a novel Context Enhancement step to improve performance on neural machine translation by maximizing mutual information using the Barlow Twins loss. Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations, eradicating the risk of disrupting semantic information. Further, our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings. Finally, we evaluate the language-agnosticism of our embeddings through language classification and use them for neural machine translation to compare with state-of-the-art approaches.
- [494] arXiv:2401.03315 (cross-list from cs.CR) [ pdf , other ]
-
Title: Malla: Demystifying Real-world Large Language Model Integrated Malicious ServicesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The underground exploitation of large language models (LLMs) for malicious services (i.e., Malla) is witnessing an uptick, amplifying the cyber threat landscape and posing questions about the trustworthiness of LLM technologies. However, there has been little effort to understand this new cybercrime, in terms of its magnitude, impact, and techniques. In this paper, we conduct the first systematic study on 212 real-world Mallas, uncovering their proliferation in underground marketplaces and exposing their operational modalities. Our study discloses the Malla ecosystem, revealing its significant growth and impact on today's public LLM services. Through examining 212 Mallas, we uncovered eight backend LLMs used by Mallas, along with 182 prompts that circumvent the protective measures of public LLM APIs. We further demystify the tactics employed by Mallas, including the abuse of uncensored LLMs and the exploitation of public LLM APIs through jailbreak prompts. Our findings enable a better understanding of the real-world exploitation of LLMs by cybercriminals, offering insights into strategies to counteract this cybercrime.
- [495] arXiv:2401.03319 (cross-list from cs.DC) [ pdf , other ]
-
Title: Comparison of Microservice Call Rate Predictions for Replication in the CloudAuthors: Narges Mehran , Arman Haghighi , Pedram Aminharati , Nikolay Nikolov , Ahmet Soylu , Dumitru Roman , Radu ProdanComments: 7 pages, 5 figures, 4 tablesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Today, many users deploy their microservice-based applications with various interconnections on a cluster of Cloud machines, subject to stochastic changes due to dynamic user requirements. To address this problem, we compare three machine learning (ML) models for predicting the microservice call rates based on the microservice times and aiming at estimating the scalability requirements. We apply the linear regression (LR), multilayer perception (MLP), and gradient boosting regression (GBR) models on the Alibaba microservice traces. The prediction results reveal that the LR model reaches a lower training time than the GBR and MLP models. However, the GBR reduces the mean absolute error and the mean absolute percentage error compared to LR and MLP models. Moreover, the prediction results show that the required number of replicas for each microservice by the gradient boosting model is close to the actual test data without any prediction.
- [496] arXiv:2401.03322 (cross-list from cs.LG) [ pdf , other ]
-
Title: Attention and Autoencoder Hybrid Model for Unsupervised Online Anomaly DetectionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: This paper introduces a hybrid attention and autoencoder (AE) model for unsupervised online anomaly detection in time series. The autoencoder captures local structural patterns in short embeddings, while the attention model learns long-term features, facilitating parallel computing with positional encoding. Unique in its approach, our proposed hybrid model combines attention and autoencoder for the first time in time series anomaly detection. It employs an attention-based mechanism, akin to the deep transformer model, with key architectural modifications for predicting the next time step window in the autoencoder's latent space. The model utilizes a threshold from the validation dataset for anomaly detection and introduces an alternative method based on analyzing the first statistical moment of error, improving accuracy without dependence on a validation dataset. Evaluation on diverse real-world benchmark datasets and comparing with other well-established models, confirms the effectiveness of our proposed model in anomaly detection.
- [497] arXiv:2401.03337 (cross-list from cs.RO) [ pdf , other ]
-
Title: MTAC: Hierarchical Reinforcement Learning-based Multi-gait Terrain-adaptive Quadruped ControllerComments: Submitted to ICRA2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Urban search and rescue missions require rapid first response to minimize loss of life and damage. Often, such efforts are assisted by humanitarian robots which need to handle dynamic operational conditions such as uneven and rough terrains, especially during mass casualty incidents like an earthquake. Quadruped robots, owing to their versatile design, have the potential to assist in such scenarios. However, control of quadruped robots in dynamic and rough terrain environments is a challenging problem due to the many degrees of freedom of these robots. Current locomotion controllers for quadrupeds are limited in their ability to produce multiple adaptive gaits, solve tasks in a time and resource-efficient manner, and require tedious training and manual tuning procedures. To address these challenges, we propose MTAC: a multi-gait terrain-adaptive controller, which utilizes a Hierarchical reinforcement learning (HRL) approach while being time and memory-efficient. We show that our proposed method scales well to a diverse range of environments with similar compute times as state-of-the-art methods. Our method showed greater than 75% on most tasks, outperforming previous work on the majority of test cases.
- [498] arXiv:2401.03343 (cross-list from cs.DL) [ pdf , ps , other ]
-
Title: Rediscovering Ranganathan: A Prismatic View of His Life through the Knowledge Graph SpectrumComments: 22 pages, 16 figuresSubjects: Digital Libraries (cs.DL) ; Artificial Intelligence (cs.AI)
Abstract: The present study puts forward a novel biographical knowledge graph (KG) on Prof. S. R. Ranganathan, one of the pioneering figures in the Library and Information Science (LIS) domain. It has been found that most of the relevant facts about Ranganathan exist in a variety of resources (e.g., books, essays, journal articles, websites, blogs, etc.), offering information in a fragmented and piecemeal way. With this dedicated KG (henceforth known as RKG), we hope to furnish a 360-degree view of his life and achievements. To the best of our knowledge, such a dedicated representation is unparalleled in its scope and coverage: using state-of-the-art technology for anyone to openly access, use/re-use, and contribute. Inspired by Ranganathan's theories and ideas, the KG was developed using a "facet-based methodology" at two levels: in the identification of the vital biographical aspects and the development of the ontological model. Finally, with this study, we call for a community-driven effort to enhance the KG and pay homage to the Father of Library Science on the hundredth anniversary of his revitalizing the LIS domain through his enduring participation.
- [499] arXiv:2401.03346 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: An Investigation of Large Language Models for Real-World Hate Speech DetectionAuthors: Keyan Guo , Alexander Hu , Jaden Mu , Ziheng Shi , Ziming Zhao , Nishant Vishwamitra , Hongxin HuComments: Accepted for publication on 22nd International Conference of Machine Learning and Applications, ICMLA 2023Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: Hate speech has emerged as a major problem plaguing our social spaces today. While there have been significant efforts to address this problem, existing methods are still significantly limited in effectively detecting hate speech online. A major limitation of existing methods is that hate speech detection is a highly contextual problem, and these methods cannot fully capture the context of hate speech to make accurate predictions. Recently, large language models (LLMs) have demonstrated state-of-the-art performance in several natural language tasks. LLMs have undergone extensive training using vast amounts of natural language data, enabling them to grasp intricate contextual details. Hence, they could be used as knowledge bases for context-aware hate speech detection. However, a fundamental problem with using LLMs to detect hate speech is that there are no studies on effectively prompting LLMs for context-aware hate speech detection. In this study, we conduct a large-scale study of hate speech detection, employing five established hate speech datasets. We discover that LLMs not only match but often surpass the performance of current benchmark machine learning models in identifying hate speech. By proposing four diverse prompting strategies that optimize the use of LLMs in detecting hate speech. Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech by fully utilizing the knowledge base in LLMs, significantly outperforming existing techniques. Furthermore, although LLMs can provide a rich knowledge base for the contextual detection of hate speech, suitable prompting strategies play a crucial role in effectively leveraging this knowledge base for efficient detection.
- [500] arXiv:2401.03358 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: The HAPPY HEDGEHOG ProjectComments: 6 pagesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Semi-autonomous machines, autonomous machines and robots inhabit closed, semi-closed and open environments, more structured environments like the household or more unstructured environments like cultural landscapes or the wilderness. There they encounter domestic animals, farm animals, working animals, and wild animals. These creatures could be disturbed, displaced, injured, or killed by the machines. Within the context of machine ethics and social robotics, the School of Business FHNW developed several design studies and prototypes for animal-friendly machines, which can be understood as moral and social machines in the spirit of these disciplines. In 2019-20, a team led by the main author developed a prototype robot lawnmower that can recognize hedgehogs, interrupt its work for them and thus protect them. Every year many of these animals die worldwide because of traditional service robots. HAPPY HEDGEHOG (HHH), as the invention is called, could be a solution to this problem. This article begins by providing an introduction to the background. Then it focuses on navigation (where the machine comes across certain objects that need to be recognized) and thermal and image recognition (with the help of machine learning) of the machine. It also presents obvious weaknesses and possible improvements. The results could be relevant for an industry that wants to market their products as animal-friendly machines.
- [501] arXiv:2401.03374 (cross-list from cs.SE) [ pdf , other ]
-
Title: LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic RewardAuthors: Nafis Tanveer Islam , Joseph Khoury , Andrew Seong , Mohammad Bahrami Karkevandi , Gonzalo De La Torre Parra , Elias Bou-Harb , Peyman NajafiradSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: In software development, the predominant emphasis on functionality often supersedes security concerns, a trend gaining momentum with AI-driven automation tools like GitHub Copilot. These tools significantly improve developers' efficiency in functional code development. Nevertheless, it remains a notable concern that such tools are also responsible for creating insecure code, predominantly because of pre-training on publicly available repositories with vulnerable code. Moreover, developers are called the "weakest link in the chain" since they have very minimal knowledge of code security. Although existing solutions provide a reasonable solution to vulnerable code, they must adequately describe and educate the developers on code security to ensure that the security issues are not repeated. Therefore we introduce a multipurpose code vulnerability analysis system \texttt{SecRepair}, powered by a large language model, CodeGen2 assisting the developer in identifying and generating fixed code along with a complete description of the vulnerability with a code comment. Our innovative methodology uses a reinforcement learning paradigm to generate code comments augmented by a semantic reward mechanism. Inspired by how humans fix code issues, we propose an instruction-based dataset suitable for vulnerability analysis with LLMs. We further identify zero-day and N-day vulnerabilities in 6 Open Source IoT Operating Systems on GitHub. Our findings underscore that incorporating reinforcement learning coupled with semantic reward augments our model's performance, thereby fortifying its capacity to address code vulnerabilities with improved efficacy.
- [502] arXiv:2401.03397 (cross-list from cs.LG) [ pdf , other ]
-
Title: Predicting the Skies: A Novel Model for Flight-Level Passenger Traffic ForecastingComments: 9 pages, 6 figures, to be publishedSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Applications (stat.AP)
Abstract: Accurate prediction of flight-level passenger traffic is of paramount importance in airline operations, influencing key decisions from pricing to route optimization. This study introduces a novel, multimodal deep learning approach to the challenge of predicting flight-level passenger traffic, yielding substantial accuracy improvements compared to traditional models. Leveraging an extensive dataset from American Airlines, our model ingests historical traffic data, fare closure information, and seasonality attributes specific to each flight. Our proposed neural network integrates the strengths of Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN), exploiting the temporal patterns and spatial relationships within the data to enhance prediction performance. Crucial to the success of our model is a comprehensive data processing strategy. We construct 3D tensors to represent data, apply careful masking strategies to mirror real-world dynamics, and employ data augmentation techniques to enrich the diversity of our training set. The efficacy of our approach is borne out in the results: our model demonstrates an approximate 33\% improvement in Mean Squared Error (MSE) compared to traditional benchmarks. This study, therefore, highlights the significant potential of deep learning techniques and meticulous data processing in advancing the field of flight traffic prediction.
- [503] arXiv:2401.03406 (cross-list from cs.RO) [ pdf , other ]
-
Title: Improving Dribbling, Passing, and Marking Actions in Soccer Simulation 2D Games Using Machine LearningAuthors: Nader Zare , Omid Amini , Aref Sayareh , Mahtab Sarvmaili , Arad Firouzkouhi , Stan Matwin , Amilcar SoaresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The RoboCup competition was started in 1997, and is known as the oldest RoboCup league. The RoboCup 2D Soccer Simulation League is a stochastic, partially observable soccer environment in which 24 autonomous agents play on two opposing teams. In this paper, we detail the main strategies and functionalities of CYRUS, the RoboCup 2021 2D Soccer Simulation League champions. The new functionalities presented and discussed in this work are (i) Multi Action Dribble, (ii) Pass Prediction and (iii) Marking Decision. The Multi Action Dribbling strategy enabled CYRUS to succeed more often and to be safer when dribbling actions were performed during a game. The Pass Prediction enhanced our gameplay by predicting our teammate's passing behavior, anticipating and making our agents collaborate better towards scoring goals. Finally, the Marking Decision addressed the multi-agent matching problem to improve CYRUS defensive strategy by finding an optimal solution to mark opponents' players.
- [504] arXiv:2401.03424 (cross-list from cs.SD) [ pdf , other ]
-
Title: MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech RecognitionComments: 5 pages, 3 figures Accepted at ICASSP 2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness. However, current studies mainly focus on fusing the well-learned modality features, like the output of modality-specific encoders, without considering the contextual relationship during the modality feature learning. In this study, we propose a multi-layer cross-attention fusion based AVSR (MLCA-AVSR) approach that promotes representation learning of each modality by fusing them at different levels of audio/visual encoders. Experimental results on the MISP2022-AVSR Challenge dataset show the efficacy of our proposed system, achieving a concatenated minimum permutation character error rate (cpCER) of 30.57% on the Eval set and yielding up to 3.17% relative improvement compared with our previous system which ranked the second place in the challenge. Following the fusion of multiple systems, our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
- [505] arXiv:2401.03426 (cross-list from cs.CL) [ pdf , other ]
-
Title: On Leveraging Large Language Models for Enhancing Entity ResolutionAuthors: Huahang Li , Longyu Feng , Shuangyin Li , Fei Hao , Chen Jason Zhang , Yuanfeng Song , Lei ChenComments: 12 pages,6 figures, ICDE 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Entity resolution, the task of identifying and consolidating records that pertain to the same real-world entity, plays a pivotal role in various sectors such as e-commerce, healthcare, and law enforcement. The emergence of Large Language Models (LLMs) like GPT-4 has introduced a new dimension to this task, leveraging their advanced linguistic capabilities. This paper explores the potential of LLMs in the entity resolution process, shedding light on both their advantages and the computational complexities associated with large-scale matching. We introduce strategies for the efficient utilization of LLMs, including the selection of an optimal set of matching questions, namely MQsSP, which is proved to be a NP-hard problem. Our approach optimally chooses the most effective matching questions while keep consumption limited to your budget . Additionally, we propose a method to adjust the distribution of possible partitions after receiving responses from LLMs, with the goal of reducing the uncertainty of entity resolution. We evaluate the effectiveness of our approach using entropy as a metric, and our experimental results demonstrate the efficiency and effectiveness of our proposed methods, offering promising prospects for real-world applications.
- [506] arXiv:2401.03461 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Amplification of Addictive New Media Features in the MetaverseComments: 14 pages, 1 figure, 1 tableSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: The emergence of the metaverse, envisioned as a hyperreal virtual universe facilitating boundless human interaction, stands to revolutionize our conception of media, with significant impacts on addiction, creativity, relationships, and social polarization. This paper aims to dissect the addictive potential of the metaverse due to its immersive and interactive features, scrutinize the effects of its recommender systems on creativity and social polarization, and explore potential consequences stemming from the metaverse development. We employed a literature review methodology, drawing parallels from the research on new media platforms and examining the progression of reality-mimicking features in media from historical perspectives to understand this transformative digital frontier. The findings suggest that these immersive and interactive features could potentially exacerbate media addiction. The designed recommender systems, while aiding personalization and user engagement, might contribute to social polarization and affect the diversity of creative output. However, our conclusions are based primarily on theoretical propositions from studies conducted on existing media platforms and lack empirical support specific to the metaverse. Therefore, this paper identifies a critical gap requiring further research, through empirical studies focused on metaverse use and addiction and exploration of privacy, security, and ethical implications associated with this burgeoning digital universe. As the development of the metaverse accelerates, it is incumbent on scholars, technologists, and policymakers to navigate its multilayered impacts thoughtfully to balance innovation with societal well-being.
- [507] arXiv:2401.03462 (cross-list from cs.CL) [ pdf , other ]
-
Title: Soaring from 4K to 400K: Extending LLM's Context with Activation BeaconSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The utilization of long contexts poses a big challenge for LLMs due to their limited context window size. Although the context window can be extended through fine-tuning, it will result in a considerable cost at both training and inference time, and exert an unfavorable impact to the LLM's original capabilities. In this work, we propose a new method called Activation Beacon, which condenses LLM's raw activations into compact forms such that the LLM can perceive a longer context with a limited context window. Activation Beacon is introduced as a plug-in module, which fully preserves the LLM's original capability in short contexts. It works with the sliding window to streamingly process the long context, which leads to a competitive memory and time efficiency in both training and inference. Activation Beacon is trained with short-sequence data of diversified condensing ratios. Thanks to such a treatment, it can be effectively learned to support different context lengths with a small training cost. Our experiment verifies Activation Beacon's effectiveness of context extension: it can remarkably accomplish high-quality extension of Llama-2-7B's context by $\times100$ times (from 4K to 400K); meanwhile, it can also achieve superior performances across a variety of long-context language modeling and understanding tasks. The source code and model checkpoint are available at \url{ this https URL }.
- [508] arXiv:2401.03467 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Maintaining Journalistic Integrity in the Digital Age: A Comprehensive NLP Framework for Evaluating Online News ContentComments: 21 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The rapid growth of online news platforms has led to an increased need for reliable methods to evaluate the quality and credibility of news articles. This paper proposes a comprehensive framework to analyze online news texts using natural language processing (NLP) techniques, particularly a language model specifically trained for this purpose, alongside other well-established NLP methods. The framework incorporates ten journalism standards-objectivity, balance and fairness, readability and clarity, sensationalism and clickbait, ethical considerations, public interest and value, source credibility, relevance and timeliness, factual accuracy, and attribution and transparency-to assess the quality of news articles. By establishing these standards, researchers, media organizations, and readers can better evaluate and understand the content they consume and produce. The proposed method has some limitations, such as potential difficulty in detecting subtle biases and the need for continuous updating of the language model to keep pace with evolving language patterns.
- [509] arXiv:2401.03469 (cross-list from cs.SE) [ pdf , other ]
-
Title: Efficient Test Data Generation for MC/DC with OCL and SearchSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: System-level testing of avionics software systems requires compliance with different international safety standards such as DO-178C. An important consideration of the avionics industry is automated test data generation according to the criteria suggested by safety standards. One of the recommended criteria by DO-178C is the modified condition/decision coverage (MC/DC) criterion. The current model-based test data generation approaches use constraints written in Object Constraint Language (OCL), and apply search techniques to generate test data. These approaches either do not support MC/DC criterion or suffer from performance issues while generating test data for large-scale avionics systems. In this paper, we propose an effective way to automate MC/DC test data generation during model-based testing. We develop a strategy that utilizes case-based reasoning (CBR) and range reduction heuristics designed to solve MC/DC-tailored OCL constraints. We performed an empirical study to compare our proposed strategy for MC/DC test data generation using CBR, range reduction, both CBR and range reduction, with an original search algorithm, and random search. We also empirically compared our strategy with existing constraint-solving approaches. The results show that both CBR and range reduction for MC/DC test data generation outperform the baseline approach. Moreover, the combination of both CBR and range reduction for MC/DC test data generation is an effective approach compared to existing constraint solvers.
- [510] arXiv:2401.03470 (cross-list from cs.CV) [ pdf , other ]
-
Title: FurniScene: A Large-scale 3D Room Dataset with Intricate Furnishing ScenesAuthors: Genghao Zhang , Yuxi Wang , Chuanchen Luo , Shibiao Xu , Junran Peng , Zhaoxiang Zhang , Man ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Indoor scene generation has attracted significant attention recently as it is crucial for applications of gaming, virtual reality, and interior design. Current indoor scene generation methods can produce reasonable room layouts but often lack diversity and realism. This is primarily due to the limited coverage of existing datasets, including only large furniture without tiny furnishings in daily life. To address these challenges, we propose FurniScene, a large-scale 3D room dataset with intricate furnishing scenes from interior design professionals. Specifically, the FurniScene consists of 11,698 rooms and 39,691 unique furniture CAD models with 89 different types, covering things from large beds to small teacups on the coffee table. To better suit fine-grained indoor scene layout generation, we introduce a novel Two-Stage Diffusion Scene Model (TSDSM) and conduct an evaluation benchmark for various indoor scene generation based on FurniScene. Quantitative and qualitative evaluations demonstrate the capability of our method to generate highly realistic indoor scenes. Our dataset and code will be publicly available soon.
- [511] arXiv:2401.03473 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition ChallengeAuthors: He Wang , Pengcheng Guo , Yue Li , Ao Zhang , Jiayao Sun , Lei Xie , Wei Chen , Pan Zhou , Hui Bu , Xin Xu , Binbin Zhang , Zhuo Chen , Jian Wu , Longbiao Wang , Eng Siong Chng , Sun LiComments: Accepted at ICASSP 2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.
- [512] arXiv:2401.03476 (cross-list from cs.MM) [ pdf , other ]
-
Title: Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker NaturalnessAuthors: Sicheng Yang , Zunnan Xu , Haiwei Xue , Yongkang Cheng , Shaoli Huang , Mingming Gong , Zhiyong WuComments: 6 pages, 3 figures, ICASSP 2024Subjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tackle these issues, we introduce FreeTalker, which, to the best of our knowledge, is the first framework for the generation of both spontaneous (e.g., co-speech gesture) and non-spontaneous (e.g., moving around the podium) speaker motions. Specifically, we train a diffusion-based model for speaker motion generation that employs unified representations of both speech-driven gestures and text-driven motions, utilizing heterogeneous data sourced from various motion datasets. During inference, we utilize classifier-free guidance to highly control the style in the clips. Additionally, to create smooth transitions between clips, we utilize DoubleTake, a method that leverages a generative prior and ensures seamless motion blending. Extensive experiments show that our method generates natural and controllable speaker movements. Our code, model, and demo are are available at \url{ this https URL }.
- [513] arXiv:2401.03489 (cross-list from cs.LG) [ pdf , other ]
-
Title: Decentralized Federated Policy Gradient with Byzantine Fault-Tolerance and Provably Fast ConvergenceComments: Accepted at AAMAS'24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Multiagent Systems (cs.MA)
Abstract: In Federated Reinforcement Learning (FRL), agents aim to collaboratively learn a common task, while each agent is acting in its local environment without exchanging raw trajectories. Existing approaches for FRL either (a) do not provide any fault-tolerance guarantees (against misbehaving agents), or (b) rely on a trusted central agent (a single point of failure) for aggregating updates. We provide the first decentralized Byzantine fault-tolerant FRL method. Towards this end, we first propose a new centralized Byzantine fault-tolerant policy gradient (PG) algorithm that improves over existing methods by relying only on assumptions standard for non-fault-tolerant PG. Then, as our main contribution, we show how a combination of robust aggregation and Byzantine-resilient agreement methods can be leveraged in order to eliminate the need for a trusted central entity. Since our results represent the first sample complexity analysis for Byzantine fault-tolerant decentralized federated non-convex optimization, our technical contributions may be of independent interest. Finally, we corroborate our theoretical results experimentally for common RL environments, demonstrating the speed-up of decentralized federations w.r.t. the number of participating agents and resilience against various Byzantine attacks.
- [514] arXiv:2401.03499 (cross-list from cs.CV) [ pdf , other ]
-
Title: Re:Draw -- Context Aware Translation as a Controllable Method for Artistic ProductionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
Abstract: We introduce context-aware translation, a novel method that combines the benefits of inpainting and image-to-image translation, respecting simultaneously the original input and contextual relevance -- where existing methods fall short. By doing so, our method opens new avenues for the controllable use of AI within artistic creation, from animation to digital art.
As an use case, we apply our method to redraw any hand-drawn animated character eyes based on any design specifications - eyes serve as a focal point that captures viewer attention and conveys a range of emotions, however, the labor-intensive nature of traditional animation often leads to compromises in the complexity and consistency of eye design. Furthermore, we remove the need for production data for training and introduce a new character recognition method that surpasses existing work by not requiring fine-tuning to specific productions. This proposed use case could help maintain consistency throughout production and unlock bolder and more detailed design choices without the production cost drawbacks. A user study shows context-aware translation is preferred over existing work 95.16% of the time. - [515] arXiv:2401.03512 (cross-list from cs.CL) [ pdf , other ]
-
Title: CharPoet: A Chinese Classical Poetry Generation System Based on Token-free LLMSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Automatic Chinese classical poetry generation has attracted much research interest, but achieving effective control over format and content simultaneously remains challenging. Traditional systems usually accept keywords as user inputs, resulting in limited control over content. Large language models (LLMs) improve content control by allowing unrestricted user instructions, but the token-by-token generation process frequently makes format errors. Motivated by this, we propose CharPoet, a Chinese classical poetry generation system based on token-free LLM, which provides effective control over both format and content. Our token-free architecture generates in a character-by-character manner, enabling precise control over the number of characters. Pruned from existing token-based LLMs, CharPoet inherits their pretrained capabilities and can generate poetry following instructions like "Write me a poem for my mother's birthday." CharPoet achieves format accuracy above 0.96, outperforming Jiuge-GPT-2 (0.91) and GPT-4 (0.38). In terms of content quality, CharPoet surpasses traditional systems including Jiuge, and is comparable to other LLMs. Our system is open source and available at this https URL . A video demonstration of CharPoet is available at this https URL .
- [516] arXiv:2401.03531 (cross-list from cs.AR) [ pdf , other ]
-
Title: A Heterogeneous RISC-V based SoC for Secure Nano-UAV NavigationAuthors: Luca Valente , Alessandro Nadalini , Asif Veeran , Mattia Sinigaglia , Bruno Sa , Nils Wistoff , Yvan Tortorella , Simone Benatti , Rafail Psiakis , Ari Kulmala , Baker Mohammad , Sandro Pinto , Daniele Palossi , Luca Benini , Davide RossiSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: The rapid advancement of energy-efficient parallel ultra-low-power (ULP) ucontrollers units (MCUs) is enabling the development of autonomous nano-sized unmanned aerial vehicles (nano-UAVs). These sub-10cm drones represent the next generation of unobtrusive robotic helpers and ubiquitous smart sensors. However, nano-UAVs face significant power and payload constraints while requiring advanced computing capabilities akin to standard drones, including real-time Machine Learning (ML) performance and the safe co-existence of general-purpose and real-time OSs. Although some advanced parallel ULP MCUs offer the necessary ML computing capabilities within the prescribed power limits, they rely on small main memories (<1MB) and ucontroller-class CPUs with no virtualization or security features, and hence only support simple bare-metal runtimes. In this work, we present Shaheen, a 9mm2 200mW SoC implemented in 22nm FDX technology. Differently from state-of-the-art MCUs, Shaheen integrates a Linux-capable RV64 core, compliant with the v1.0 ratified Hypervisor extension and equipped with timing channel protection, along with a low-cost and low-power memory controller exposing up to 512MB of off-chip low-cost low-power HyperRAM directly to the CPU. At the same time, it integrates a fully programmable energy- and area-efficient multi-core cluster of RV32 cores optimized for general-purpose DSP as well as reduced- and mixed-precision ML. To the best of the authors' knowledge, it is the first silicon prototype of a ULP SoC coupling the RV64 and RV32 cores in a heterogeneous host+accelerator architecture fully based on the RISC-V ISA. We demonstrate the capabilities of the proposed SoC on a wide range of benchmarks relevant to nano-UAV applications. The cluster can deliver up to 90GOp/s and up to 1.8TOp/s/W on 2-bit integer kernels and up to 7.9GFLOp/s and up to 150GFLOp/s/W on 16-bit FP kernels.
- [517] arXiv:2401.03545 (cross-list from cs.DL) [ pdf , other ]
-
Title: Is there really a Citation Age Bias in NLP?Subjects: Digital Libraries (cs.DL) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Citations are a key ingredient of scientific research to relate a paper to others published in the community. Recently, it has been noted that there is a citation age bias in the Natural Language Processing (NLP) community, one of the currently fastest growing AI subfields, in that the mean age of the bibliography of NLP papers has become ever younger in the last few years, leading to `citation amnesia' in which older knowledge is increasingly forgotten. In this work, we put such claims into perspective by analyzing the bibliography of $\sim$300k papers across 15 different scientific fields submitted to the popular preprint server Arxiv in the time period from 2013 to 2022. We find that all AI subfields (in particular: cs.AI, cs.CL, cs.CV, cs.LG) have similar trends of citation amnesia, in which the age of the bibliography has roughly halved in the last 10 years (from above 12 in 2013 to below 7 in 2022), on average. Rather than diagnosing this as a citation age bias in the NLP community, we believe this pattern is an artefact of the dynamics of these research fields, in which new knowledge is produced in ever shorter time intervals.
- [518] arXiv:2401.03552 (cross-list from cs.CR) [ pdf , other ]
-
Title: Privacy-Preserving in Blockchain-based Federated Learning SystemsAuthors: Sameera K. M. , Serena Nicolazzo , Marco Arazzi , Antonino Nocera , Rafidha Rehiman K. A. , Vinod P , Mauro ContiComments: 44 pages, 11 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Federated Learning (FL) has recently arisen as a revolutionary approach to collaborative training Machine Learning models. According to this novel framework, multiple participants train a global model collaboratively, coordinating with a central aggregator without sharing their local data. As FL gains popularity in diverse domains, security, and privacy concerns arise due to the distributed nature of this solution. Therefore, integrating this strategy with Blockchain technology has been consolidated as a preferred choice to ensure the privacy and security of participants.
This paper explores the research efforts carried out by the scientific community to define privacy solutions in scenarios adopting Blockchain-Enabled FL. It comprehensively summarizes the background related to FL and Blockchain, evaluates existing architectures for their integration, and the primary attacks and possible countermeasures to guarantee privacy in this setting. Finally, it reviews the main application scenarios where Blockchain-Enabled FL approaches have been proficiently applied. This survey can help academia and industry practitioners understand which theories and techniques exist to improve the performance of FL through Blockchain to preserve privacy and which are the main challenges and future directions in this novel and still under-explored context. We believe this work provides a novel contribution respect to the previous surveys and is a valuable tool to explore the current landscape, understand perspectives, and pave the way for advancements or improvements in this amalgamation of Blockchain and Federated Learning. - [519] arXiv:2401.03562 (cross-list from cs.LG) [ pdf , other ]
-
Title: GLOCALFAIR: Jointly Improving Global and Local Group Fairness in Federated LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Federated learning (FL) has emerged as a prospective solution for collaboratively learning a shared model across clients without sacrificing their data privacy. However, the federated learned model tends to be biased against certain demographic groups (e.g., racial and gender groups) due to the inherent FL properties, such as data heterogeneity and party selection. Unlike centralized learning, mitigating bias in FL is particularly challenging as private training datasets and their sensitive attributes are typically not directly accessible. Most prior research in this field only focuses on global fairness while overlooking the local fairness of individual clients. Moreover, existing methods often require sensitive information about the client's local datasets to be shared, which is not desirable. To address these issues, we propose GLOCALFAIR, a client-server co-design fairness framework that can jointly improve global and local group fairness in FL without the need for sensitive statistics about the client's private datasets. Specifically, we utilize constrained optimization to enforce local fairness on the client side and adopt a fairness-aware clustering-based aggregation on the server to further ensure the global model fairness across different sensitive groups while maintaining high utility. Experiments on two image datasets and one tabular dataset with various state-of-the-art fairness baselines show that GLOCALFAIR can achieve enhanced fairness under both global and local data distributions while maintaining a good level of utility and client fairness.
- [520] arXiv:2401.03581 (cross-list from cs.HC) [ pdf , other ]
-
Title: Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical EmbodimentsAuthors: Zhonghao Shi , Han Chen , Anna-Maria Velentza , Siqi Liu , Nathaniel Dennler , Allison O'Connell , Maja MatarićComments: Published in Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, pp. 516-524. 2023Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Mindfulness-based therapies have been shown to be effective in improving mental health, and technology-based methods have the potential to expand the accessibility of these therapies. To enable real-time personalized content generation for mindfulness practice in these methods, high-quality computer-synthesized text-to-speech (TTS) voices are needed to provide verbal guidance and respond to user performance and preferences. However, the user-perceived quality of state-of-the-art TTS voices has not yet been evaluated for administering mindfulness meditation, which requires emotional expressiveness. In addition, work has not yet been done to study the effect of physical embodiment and personalization on the user-perceived quality of TTS voices for mindfulness. To that end, we designed a two-phase human subject study. In Phase 1, an online Mechanical Turk between-subject study (N=471) evaluated 3 (feminine, masculine, child-like) state-of-the-art TTS voices with 2 (feminine, masculine) human therapists' voices in 3 different physical embodiment settings (no agent, conversational agent, socially assistive robot) with remote participants. Building on findings from Phase 1, in Phase 2, an in-person within-subject study (N=94), we used a novel framework we developed for personalizing TTS voices based on user preferences, and evaluated user-perceived quality compared to best-rated non-personalized voices from Phase 1. We found that the best-rated human voice was perceived better than all TTS voices; the emotional expressiveness and naturalness of TTS voices were poorly rated, while users were satisfied with the clarity of TTS voices. Surprisingly, by allowing users to fine-tune TTS voice features, the user-personalized TTS voices could perform almost as well as human voices, suggesting user personalization could be a simple and very effective tool to improve user-perceived quality of TTS voice.
- [521] arXiv:2401.03587 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Big Data and Deep Learning in Smart Cities: A Comprehensive Dataset for AI-Driven Traffic Accident Detection and Computer Vision SystemsAuthors: Victor Adewopo , Nelly Elsayed , Zag Elsayed , Murat Ozer , Constantinos Zekios , Ahmed Abdelgawad , Magdy BayoumiSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In the dynamic urban landscape, where the interplay of vehicles and pedestrians defines the rhythm of life, integrating advanced technology for safety and efficiency is increasingly crucial. This study delves into the application of cutting-edge technological methods in smart cities, focusing on enhancing public safety through improved traffic accident detection. Action recognition plays a pivotal role in interpreting visual data and tracking object motion such as human pose estimation in video sequences. The challenges of action recognition include variability in rapid actions, limited dataset, and environmental factors such as (Weather, Illumination, and Occlusions). In this paper, we present a novel comprehensive dataset for traffic accident detection. This datasets is specifically designed to bolster computer vision and action recognition systems in predicting and detecting road traffic accidents. We integrated datasets from wide variety of data sources, road networks, weather conditions, and regions across the globe. This approach is underpinned by empirical studies, aiming to contribute to the discourse on how technology can enhance the quality of life in densely populated areas. This research aims to bridge existing research gaps by introducing benchmark datasets that leverage state-of-the-art algorithms tailored for traffic accident detection in smart cities. These dataset is expected to advance academic research and also enhance real-time accident detection applications, contributing significantly to the evolution of smart urban environments. Our study marks a pivotal step towards safer, more efficient smart cities, harnessing the power of AI and machine learning to transform urban living.
- [522] arXiv:2401.03597 (cross-list from cs.LG) [ pdf , other ]
-
Title: Few-Shot Causal Representation Learning for Out-of-Distribution Generalization on Heterogeneous GraphsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Heterogeneous graph few-shot learning (HGFL) has been developed to address the label sparsity issue in heterogeneous graphs (HGs), which consist of various types of nodes and edges. The core concept of HGFL is to extract knowledge from rich-labeled classes in a source HG, transfer this knowledge to a target HG to facilitate learning new classes with few-labeled training data, and finally make predictions on unlabeled testing data. Existing methods typically assume that the source HG, training data, and testing data all share the same distribution. However, in practice, distribution shifts among these three types of data are inevitable due to two reasons: (1) the limited availability of the source HG that matches the target HG distribution, and (2) the unpredictable data generation mechanism of the target HG. Such distribution shifts result in ineffective knowledge transfer and poor learning performance in existing methods, thereby leading to a novel problem of out-of-distribution (OOD) generalization in HGFL. To address this challenging problem, we propose a novel Causal OOD Heterogeneous graph Few-shot learning model, namely COHF. In COHF, we first characterize distribution shifts in HGs with a structural causal model, establishing an invariance principle for OOD generalization in HGFL. Then, following this invariance principle, we propose a new variational autoencoder-based heterogeneous graph neural network to mitigate the impact of distribution shifts. Finally, by integrating this network with a novel meta-learning framework, COHF effectively transfers knowledge to the target HG to predict new classes with few-labeled data. Extensive experiments on seven real-world datasets have demonstrated the superior performance of COHF over the state-of-the-art methods.
- [523] arXiv:2401.03599 (cross-list from cs.RO) [ pdf , other ]
-
Title: Disentangled Neural Relational Inference for Interpretable Motion PredictionJournal-ref: IEEE Robotics and Automation Letters, Date: FEBRUARY 2024 , Volume: 9, Issue: 2, ISSN: 2377-3766, pp1452-1459Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Effective interaction modeling and behavior prediction of dynamic agents play a significant role in interactive motion planning for autonomous robots. Although existing methods have improved prediction accuracy, few research efforts have been devoted to enhancing prediction model interpretability and out-of-distribution (OOD) generalizability. This work addresses these two challenging aspects by designing a variational auto-encoder framework that integrates graph-based representations and time-sequence models to efficiently capture spatio-temporal relations between interactive agents and predict their dynamics. Our model infers dynamic interaction graphs in a latent space augmented with interpretable edge features that characterize the interactions. Moreover, we aim to enhance model interpretability and performance in OOD scenarios by disentangling the latent space of edge features, thereby strengthening model versatility and robustness. We validate our approach through extensive experiments on both simulated and real-world datasets. The results show superior performance compared to existing methods in modeling spatio-temporal relations, motion prediction, and identifying time-invariant latent features.
- [524] arXiv:2401.03601 (cross-list from cs.CL) [ pdf , other ]
-
Title: InFoBench: Evaluating Instruction Following Ability in Large Language ModelsAuthors: Yiwei Qin , Kaiqiang Song , Yebowen Hu , Wenlin Yao , Sangwoo Cho , Xiaoyang Wang , Xuansheng Wu , Fei Liu , Pengfei Liu , Dong YuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instructions into simpler criteria, facilitating a detailed analysis of LLMs' compliance with various aspects of tasks. Alongside this metric, we present InFoBench, a benchmark comprising 500 diverse instructions and 2,250 decomposed questions across multiple constraint categories. Our experiments compare DRFR with traditional scoring methods and explore annotation sources, including human experts, crowd-sourced workers, and GPT-4. The findings demonstrate DRFR's higher reliability and the effectiveness of using GPT-4 as a cost-efficient annotator. The evaluation of several advanced LLMs using this framework reveals their strengths and areas needing improvement, particularly in complex instruction-following. This study contributes a novel metric and benchmark, offering insights for future LLM development and evaluation.
- [525] arXiv:2401.03605 (cross-list from cs.IR) [ pdf , other ]
-
Title: ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with FeedbackSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Recommendation algorithms have been pivotal in handling the overwhelming volume of online content. However, these algorithms seldom consider direct user input, resulting in superficial interaction between them. Efforts have been made to include the user directly in the recommendation process through conversation, but these systems too have had limited interactivity. Recently, Large Language Models (LLMs) like ChatGPT have gained popularity due to their ease of use and their ability to adapt dynamically to various tasks while responding to feedback. In this paper, we investigate the effectiveness of ChatGPT as a top-n conversational recommendation system. We build a rigorous pipeline around ChatGPT to simulate how a user might realistically probe the model for recommendations: by first instructing and then reprompting with feedback to refine a set of recommendations. We further explore the effect of popularity bias in ChatGPT's recommendations, and compare its performance to baseline models. We find that reprompting ChatGPT with feedback is an effective strategy to improve recommendation relevancy, and that popularity bias can be mitigated through prompt engineering.
- [526] arXiv:2401.03609 (cross-list from cs.LG) [ pdf , other ]
-
Title: Multi-Modal Federated Learning for Cancer Staging over Non-IID Datasets with Unbalanced ModalitiesComments: 10 pages, 4 figures, 5 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The use of machine learning (ML) for cancer staging through medical image analysis has gained substantial interest across medical disciplines. When accompanied by the innovative federated learning (FL) framework, ML techniques can further overcome privacy concerns related to patient data exposure. Given the frequent presence of diverse data modalities within patient records, leveraging FL in a multi-modal learning framework holds considerable promise for cancer staging. However, existing works on multi-modal FL often presume that all data-collecting institutions have access to all data modalities. This oversimplified approach neglects institutions that have access to only a portion of data modalities within the system. In this work, we introduce a novel FL architecture designed to accommodate not only the heterogeneity of data samples, but also the inherent heterogeneity/non-uniformity of data modalities across institutions. We shed light on the challenges associated with varying convergence speeds observed across different data modalities within our FL system. Subsequently, we propose a solution to tackle these challenges by devising a distributed gradient blending and proximity-aware client weighting strategy tailored for multi-modal FL. To show the superiority of our method, we conduct experiments using The Cancer Genome Atlas program (TCGA) datalake considering different cancer types and three modalities of data: mRNA sequences, histopathological image data, and clinical information.
- [527] arXiv:2401.03630 (cross-list from cs.MA) [ pdf , other ]
-
Title: Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded YetSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: With the explosive influence caused by the success of large language models (LLM) like ChatGPT and GPT-4, there has been an extensive amount of recent work showing that foundation models can be used to solve a large variety of tasks. However, there is very limited work that shares insights on multi-agent planning. Multi-agent planning is different from other domains by combining the difficulty of multi-agent coordination and planning, and making it hard to leverage external tools to facilitate the reasoning needed. In this paper, we focus on the problem of multi-agent path finding (MAPF), which is also known as multi-robot route planning, and study the performance of solving MAPF with LLMs. We first show the motivating success on an empty room map without obstacles, then the failure to plan on the harder room map and maze map of the standard MAPF benchmark. We present our position on why directly solving MAPF with LLMs has not been successful yet, and we use various experiments to support our hypothesis. Based on our results, we discussed how researchers with different backgrounds could help with this problem from different perspectives.
- [528] arXiv:2401.03631 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Bridging the Skills Gap: Evaluating an AI-Assisted Provider Platform to Support Care Providers with Empathetic Delivery of Protocolized TherapyAuthors: William R. Kearns , Jessica Bertram , Myra Divina , Lauren Kemp , Yinzhou Wang , Alex Marin , Trevor Cohen , Weichao YuwenComments: Accepted: AMIA Annual Symposium 2023. To appear as: Kearns W, Bertram J, Divina M, Kemp L, Wang Y, Marin A, Cohen T, Yuwen W. Bridging the Skills Gap: Evaluating an AI-Assisted Provider Platform to Support Care Providers with Empathetic Delivery of Protocolized Therapy. AMIA Annual Symposium Proceedings 2023. American Medical Informatics AssociationSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Despite the high prevalence and burden of mental health conditions, there is a global shortage of mental health providers. Artificial Intelligence (AI) methods have been proposed as a way to address this shortage, by supporting providers with less extensive training as they deliver care. To this end, we developed the AI-Assisted Provider Platform (A2P2), a text-based virtual therapy interface that includes a response suggestion feature, which supports providers in delivering protocolized therapies empathetically. We studied providers with and without expertise in mental health treatment delivering a therapy session using the platform with (intervention) and without (control) AI-assistance features. Upon evaluation, the AI-assisted system significantly decreased response times by 29.34% (p=0.002), tripled empathic response accuracy (p=0.0001), and increased goal recommendation accuracy by 66.67% (p=0.001) across both user groups compared to the control. Both groups rated the system as having excellent usability.
- [529] arXiv:2401.03676 (cross-list from cs.SE) [ pdf , other ]
-
Title: Assessing AI Detectors in Identifying AI-Generated Code: Implications for EducationAuthors: Wei Hung Pan , Ming Jie Chok , Jonathan Leong Shan Wong , Yung Xin Shin , Yeong Shian Poon , Zhou Yang , Chun Yong Chong , David Lo , Mei Kuan LimComments: 11 pages, paper accepted at 46th International Conference on Software Engineering, Software Engineering Education and Training Track (ICSE-SEET 2024)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Educators are increasingly concerned about the usage of Large Language Models (LLMs) such as ChatGPT in programming education, particularly regarding the potential exploitation of imperfections in Artificial Intelligence Generated Content (AIGC) Detectors for academic misconduct. In this paper, we present an empirical study where the LLM is examined for its attempts to bypass detection by AIGC Detectors. This is achieved by generating code in response to a given question using different variants. We collected a dataset comprising 5,069 samples, with each sample consisting of a textual description of a coding problem and its corresponding human-written Python solution codes. These samples were obtained from various sources, including 80 from Quescol, 3,264 from Kaggle, and 1,725 from LeetCode. From the dataset, we created 13 sets of code problem variant prompts, which were used to instruct ChatGPT to generate the outputs. Subsequently, we assessed the performance of five AIGC detectors. Our results demonstrate that existing AIGC Detectors perform poorly in distinguishing between human-written code and AI-generated code.
- [530] arXiv:2401.03694 (cross-list from cs.CV) [ pdf , other ]
-
Title: GloTSFormer: Global Video Text Spotting TransformerSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Video Text Spotting (VTS) is a fundamental visual task that aims to predict the trajectories and content of texts in a video. Previous works usually conduct local associations and apply IoU-based distance and complex post-processing procedures to boost performance, ignoring the abundant temporal information and the morphological characteristics in VTS. In this paper, we propose a novel Global Video Text Spotting Transformer GloTSFormer to model the tracking problem as global associations and utilize the Gaussian Wasserstein distance to guide the morphological correlation between frames. Our main contributions can be summarized as three folds. 1). We propose a Transformer-based global tracking method GloTSFormer for VTS and associate multiple frames simultaneously. 2). We introduce a Wasserstein distance-based method to conduct positional associations between frames. 3). We conduct extensive experiments on public datasets. On the ICDAR2015 video dataset, GloTSFormer achieves 56.0 MOTA with 4.6 absolute improvement compared with the previous SOTA method and outperforms the previous Transformer-based method by a significant 8.3 MOTA.
- [531] arXiv:2401.03695 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Large-Scale Empirical Study on Improving the Fairness of Image Classification ModelsComments: Accepted by the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024). Please include ISSTA in any citationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Fairness has been a critical issue that affects the adoption of deep learning models in real practice. To improve model fairness, many existing methods have been proposed and evaluated to be effective in their own contexts. However, there is still no systematic evaluation among them for a comprehensive comparison under the same context, which makes it hard to understand the performance distinction among them, hindering the research progress and practical adoption of them. To fill this gap, this paper endeavours to conduct the first large-scale empirical study to comprehensively compare the performance of existing state-of-the-art fairness improving techniques. Specifically, we target the widely-used application scenario of image classification, and utilized three different datasets and five commonly-used performance metrics to assess in total 13 methods from diverse categories. Our findings reveal substantial variations in the performance of each method across different datasets and sensitive attributes, indicating over-fitting on specific datasets by many existing methods. Furthermore, different fairness evaluation metrics, due to their distinct focuses, yield significantly different assessment results. Overall, we observe that pre-processing methods and in-processing methods outperform post-processing methods, with pre-processing methods exhibiting the best performance. Our empirical study offers comprehensive recommendations for enhancing fairness in deep learning models. We approach the problem from multiple dimensions, aiming to provide a uniform evaluation platform and inspire researchers to explore more effective fairness solutions via a set of implications.
- [532] arXiv:2401.03717 (cross-list from cs.LG) [ pdf , other ]
-
Title: Universal Time-Series Representation Learning: A SurveyAuthors: Patara Trirat , Yooju Shin , Junhyeok Kang , Youngeun Nam , Jihye Na , Minyoung Bae , Joeun Kim , Byunghyun Kim , Jae-Gil LeeComments: 39 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Time-series data exists in every corner of real-world systems and services, ranging from satellites in the sky to wearable devices on human bodies. Learning representations by extracting and inferring valuable information from these time series is crucial for understanding the complex dynamics of particular phenomena and enabling informed decisions. With the learned representations, we can perform numerous downstream analyses more effectively. Among several approaches, deep learning has demonstrated remarkable performance in extracting hidden patterns and features from time-series data without manual feature engineering. This survey first presents a novel taxonomy based on three fundamental elements in designing state-of-the-art universal representation learning methods for time series. According to the proposed taxonomy, we comprehensively review existing studies and discuss their intuitions and insights into how these methods enhance the quality of learned representations. Finally, as a guideline for future studies, we summarize commonly used experimental setups and datasets and discuss several promising research directions. An up-to-date corresponding resource is available at this https URL .
- [533] arXiv:2401.03729 (cross-list from cs.CL) [ pdf , other ]
-
Title: The Butterfly Effect of Altering Prompts: How Small Changes and Jailbreaks Affect Large Language Model PerformanceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) are regularly being used to label data across many domains and for myriad tasks. By simply asking the LLM for an answer, or ``prompting,'' practitioners are able to use LLMs to quickly get a response for an arbitrary task. This prompting is done through a series of decisions by the practitioner, from simple wording of the prompt, to requesting the output in a certain data format, to jailbreaking in the case of prompts that address more sensitive topics. In this work, we ask: do variations in the way a prompt is constructed change the ultimate decision of the LLM? We answer this using a series of prompt variations across a variety of text classification tasks. We find that even the smallest of perturbations, such as adding a space at the end of a prompt, can cause the LLM to change its answer. Further, we find that requesting responses in XML and commonly used jailbreaks can have cataclysmic effects on the data labeled by LLMs.
- [534] arXiv:2401.03756 (cross-list from cs.LG) [ pdf , other ]
-
Title: Adaptive Experimental Design for Policy LearningComments: arXiv admin note: text overlap with arXiv:2302.02988Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Econometrics (econ.EM); Methodology (stat.ME); Machine Learning (stat.ML)
Abstract: Evidence-based targeting has been a topic of growing interest among the practitioners of policy and business. Formulating decision-maker's policy learning as a fixed-budget best arm identification (BAI) problem with contextual information, we study an optimal adaptive experimental design for policy learning with multiple treatment arms. In the sampling stage, the planner assigns treatment arms adaptively over sequentially arriving experimental units upon observing their contextual information (covariates). After the experiment, the planner recommends an individualized assignment rule to the population. Setting the worst-case expected regret as the performance criterion of adaptive sampling and recommended policies, we derive its asymptotic lower bounds, and propose a strategy, Adaptive Sampling-Policy Learning strategy (PLAS), whose leading factor of the regret upper bound aligns with the lower bound as the size of experimental units increases.
- [535] arXiv:2401.03768 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Corn Yield Prediction Model with Deep Neural Networks for Smallholder Farmer Decision Support SystemComments: 30 Pages, 11 Figures, 3 TablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Abstract: Crop yield prediction has been modeled on the assumption that there is no interaction between weather and soil variables. However, this paper argues that an interaction exists, and it can be finely modelled using the Kendall Correlation coefficient. Given the nonlinearity of the interaction between weather and soil variables, a deep neural network regressor (DNNR) is carefully designed with consideration to the depth, number of neurons of the hidden layers, and the hyperparameters with their optimizations. Additionally, a new metric, the average of absolute root squared error (ARSE) is proposed to combine the strengths of root mean square error (RMSE) and mean absolute error (MAE). With the ARSE metric, the proposed DNNR(s), optimised random forest regressor (RFR) and the extreme gradient boosting regressor (XGBR) achieved impressively small yield errors, 0.0172 t/ha, and 0.0243 t/ha, 0.0001 t/ha, and 0.001 t/ha, respectively. However, the DNNR(s), with changes to the explanatory variables to ensure generalizability to unforeseen data, DNNR(s) performed best. Further analysis reveals that a strong interaction does exist between weather and soil variables. Precisely, yield is observed to increase when precipitation is reduced and silt increased, and vice-versa. However, the degree of decrease or increase is not quantified in this paper. Contrary to existing yield models targeted towards agricultural policies and global food security, the goal of the proposed corn yield model is to empower the smallholder farmer to farm smartly and intelligently, thus the prediction model is integrated into a mobile application that includes education, and a farmer-to-market access module.
- [536] arXiv:2401.03786 (cross-list from cs.LG) [ pdf , other ]
-
Title: Long-term Safe Reinforcement Learning with Binary FeedbackComments: Accepted to AAAI-24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Safety is an indispensable requirement for applying reinforcement learning (RL) to real problems. Although there has been a surge of safe RL algorithms proposed in recent years, most existing work typically 1) relies on receiving numeric safety feedback; 2) does not guarantee safety during the learning process; 3) limits the problem to a priori known, deterministic transition dynamics; and/or 4) assume the existence of a known safe policy for any states. Addressing the issues mentioned above, we thus propose Long-term Binaryfeedback Safe RL (LoBiSaRL), a safe RL algorithm for constrained Markov decision processes (CMDPs) with binary safety feedback and an unknown, stochastic state transition function. LoBiSaRL optimizes a policy to maximize rewards while guaranteeing a long-term safety that an agent executes only safe state-action pairs throughout each episode with high probability. Specifically, LoBiSaRL models the binary safety function via a generalized linear model (GLM) and conservatively takes only a safe action at every time step while inferring its effect on future safety under proper assumptions. Our theoretical results show that LoBiSaRL guarantees the long-term safety constraint, with high probability. Finally, our empirical results demonstrate that our algorithm is safer than existing methods without significantly compromising performance in terms of reward.
- [537] arXiv:2401.03804 (cross-list from cs.CL) [ pdf , other ]
-
Title: TeleChat Technical ReportAuthors: Zhongjiang He , Zihan Wang , Xinzhang Liu , Shixuan Liu , Yitong Yao , Yuyao Huang , Xuelong Li , Yongxiang Li , Zhonghao Che , Zhaoxi Zhang , Yan Wang , Xin Wang , Luwen Pu , Huinan Xu , Ruiyu Fang , Yu Zhao , Jie Zhang , Xiaomeng Huang , Zhilong Lu , Jiaxin Peng , Wenjun Zheng , Shiquan Wang , Bingkai Yang , Xuewei he , Zhuoru Jiang , Qiyi Xie , Yanhan Zhang , Zhongqiu Li , Lingling Shi , Weiwei Fu , Yin Zhang , Zilu Huang , Sishi Xiong , Yuxiang Zhang , Chao Wang , Shuangyong SongComments: 28 pages, 2 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this technical report, we present TeleChat, a collection of large language models (LLMs) with parameters of 3 billion, 7 billion and 12 billion. It includes pretrained language models as well as fine-tuned chat models that is aligned with human preferences. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, including trillions of tokens. Subsequently, the model undergoes fine-tuning to align with human preferences, following a detailed methodology that we describe. We evaluate the performance of TeleChat on various tasks, including language understanding, mathematics, reasoning, code generation, and knowledge-based question answering. Our findings indicate that TeleChat achieves comparable performance to other open-source models of similar size across a wide range of public benchmarks. To support future research and applications utilizing LLMs, we release the fine-tuned model checkpoints of TeleChat's 7B and 12B variant, along with code and a portion of our pretraining data, to the public community.
- [538] arXiv:2401.03830 (cross-list from cs.LG) [ pdf , other ]
-
Title: A foundation for exact binarized morphological neural networksComments: Accepted at conference ICCV 2023 Workshop LBQNN. Same work, different format, accepted at conference NeurIPS 2023 Workshop WANT. 8 pages, 17 pages appendixSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Training and running deep neural networks (NNs) often demands a lot of computation and energy-intensive specialized hardware (e.g. GPU, TPU...). One way to reduce the computation and power cost is to use binary weight NNs, but these are hard to train because the sign function has a non-smooth gradient. We present a model based on Mathematical Morphology (MM), which can binarize ConvNets without losing performance under certain conditions, but these conditions may not be easy to satisfy in real-world scenarios. To solve this, we propose two new approximation methods and develop a robust theoretical framework for ConvNets binarization using MM. We propose as well regularization losses to improve the optimization. We empirically show that our model can learn a complex morphological network, and explore its performance on a classification task.
- [539] arXiv:2401.03854 (cross-list from cs.CV) [ pdf , other ]
-
Title: TIER: Text-Image Encoder-based Regression for AIGC Image Quality AssessmentAuthors: Jiquan Yuan , Xinyan Cao , Jinming Che , Qinyuan Wang , Sen Liang , Wei Ren , Jinlong Lin , Xixin CaoComments: 12 pages, 8 figures. arXiv admin note: text overlap with arXiv:2312.05897Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recently, AIGC image quality assessment (AIGCIQA), which aims to assess the quality of AI-generated images (AIGIs) from a human perception perspective, has emerged as a new topic in computer vision. Unlike common image quality assessment tasks where images are derived from original ones distorted by noise, blur, and compression, \textit{etc.}, in AIGCIQA tasks, images are typically generated by generative models using text prompts. Considerable efforts have been made in the past years to advance AIGCIQA. However, most existing AIGCIQA methods regress predicted scores directly from individual generated images, overlooking the information contained in the text prompts of these images. This oversight partially limits the performance of these AIGCIQA methods. To address this issue, we propose a text-image encoder-based regression (TIER) framework. Specifically, we process the generated images and their corresponding text prompts as inputs, utilizing a text encoder and an image encoder to extract features from these text prompts and generated images, respectively. To demonstrate the effectiveness of our proposed TIER method, we conduct extensive experiments on several mainstream AIGCIQA databases, including AGIQA-1K, AGIQA-3K, and AIGCIQA2023. The experimental results indicate that our proposed TIER method generally demonstrates superior performance compared to baseline in most cases.
- [540] arXiv:2401.03855 (cross-list from cs.CL) [ pdf , other ]
-
Title: PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLMSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Driven by the surge in code generation using large language models (LLMs), numerous benchmarks have emerged to evaluate these LLMs capabilities. We conducted a large-scale human evaluation of HumanEval and MBPP, two popular benchmarks for Python code generation, analyzing their diversity and difficulty. Our findings unveil a critical bias towards a limited set of programming concepts, neglecting most of the other concepts entirely. Furthermore, we uncover a worrying prevalence of easy tasks, potentially inflating model performance estimations. To address these limitations, we propose a novel benchmark, PythonSaga, featuring 185 hand-crafted prompts on a balanced representation of 38 programming concepts across diverse difficulty levels.
- [541] arXiv:2401.03857 (cross-list from cs.LG) [ pdf , other ]
-
Title: Inverse Reinforcement Learning with Sub-optimal ExpertsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Inverse Reinforcement Learning (IRL) techniques deal with the problem of deducing a reward function that explains the behavior of an expert agent who is assumed to act optimally in an underlying unknown task. In several problems of interest, however, it is possible to observe the behavior of multiple experts with different degree of optimality (e.g., racing drivers whose skills ranges from amateurs to professionals). For this reason, in this work, we extend the IRL formulation to problems where, in addition to demonstrations from the optimal agent, we can observe the behavior of multiple sub-optimal experts. Given this problem, we first study the theoretical properties of the class of reward functions that are compatible with a given set of experts, i.e., the feasible reward set. Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards. Furthermore, we study the statistical complexity of estimating the feasible reward set with a generative model. To this end, we analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
- [542] arXiv:2401.03868 (cross-list from cs.AR) [ pdf , other ]
-
Title: FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGAsAuthors: Shulin Zeng , Jun Liu , Guohao Dai , Xinhao Yang , Tianyu Fu , Hongyi Wang , Wenheng Ma , Hanbo Sun , Shiyao Li , Zixiao Huang , Yadong Dai , Jintao Li , Zehao Wang , Ruoyu Zhang , Kairui Wen , Xuefei Ning , Yu WangComments: Accepted to FPGA'24Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: Transformer-based Large Language Models (LLMs) have made a significant impact on various domains. However, LLMs' efficiency suffers from both heavy computation and memory overheads. Compression techniques like sparsification and quantization are commonly used to mitigate the gap between LLM's computation/memory overheads and hardware capacity. However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads.
This paper proposes FlightLLM, enabling efficient LLMs inference with a complete mapping flow on FPGAs. In FlightLLM, we highlight an innovative solution that the computation and memory overhead of LLMs can be solved by utilizing FPGA-specific resources (e.g., DSP48 and heterogeneous memory hierarchy). We propose a configurable sparse DSP chain to support different sparsity patterns with high computation efficiency. Second, we propose an always-on-chip decode scheme to boost memory bandwidth with mixed-precision support. Finally, to make FlightLLM available for real-world LLMs, we propose a length adaptive compilation method to reduce the compilation overhead. Implemented on the Xilinx Alveo U280 FPGA, FlightLLM achieves 6.0$\times$ higher energy efficiency and 1.8$\times$ better cost efficiency against commercial GPUs (e.g., NVIDIA V100S) on modern LLMs (e.g., LLaMA2-7B) using vLLM and SmoothQuant under the batch size of one. FlightLLM beats NVIDIA A100 GPU with 1.2$\times$ higher throughput using the latest Versal VHK158 FPGA. - [543] arXiv:2401.03890 (cross-list from cs.CV) [ pdf , other ]
-
Title: A Survey on 3D Gaussian SplattingComments: Ongoing projectSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM)
Abstract: 3D Gaussian splatting (GS) has recently emerged as a transformative technique in the realm of explicit radiance field and computer graphics. This innovative approach, characterized by the utilization of millions of learnable 3D Gaussians, represents a significant departure from mainstream neural radiance field approaches, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS, with its explicit scene representation and differentiable rendering algorithm, not only promises real-time rendering capability but also introduces unprecedented levels of editability. This positions 3D GS as a potential game-changer for the next generation of 3D reconstruction and representation. In the present paper, we provide the first systematic overview of the recent developments and critical contributions in the domain of 3D GS. We begin with a detailed exploration of the underlying principles and the driving forces behind the emergence of 3D GS, laying the groundwork for understanding its significance. A focal point of our discussion is the practical applicability of 3D GS. By enabling unprecedented rendering speed, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond. This is complemented by a comparative analysis of leading 3D GS models, evaluated across various benchmark tasks to highlight their performance and practical utility. The survey concludes by identifying current challenges and suggesting potential avenues for future research in this domain. Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in applicable and explicit radiance field representation.
- [544] arXiv:2401.03910 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Philosophical Introduction to Language Models -- Part I: Continuity With Classic DebatesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models like GPT-4 have achieved remarkable proficiency in a broad spectrum of language-based tasks, some of which are traditionally associated with hallmarks of human intelligence. This has prompted ongoing disagreements about the extent to which we can meaningfully ascribe any kind of linguistic or cognitive competence to language models. Such questions have deep philosophical roots, echoing longstanding debates about the status of artificial neural networks as cognitive models. This article -- the first part of two companion papers -- serves both as a primer on language models for philosophers, and as an opinionated survey of their significance in relation to classic debates in the philosophy cognitive science, artificial intelligence, and linguistics. We cover topics such as compositionality, language acquisition, semantic competence, grounding, world models, and the transmission of cultural knowledge. We argue that the success of language models challenges several long-held assumptions about artificial neural networks. However, we also highlight the need for further empirical investigation to better understand their internal mechanisms. This sets the stage for the companion paper (Part II), which turns to novel empirical methods for probing the inner workings of language models, and new philosophical questions prompted by their latest developments.
- [545] arXiv:2401.03925 (cross-list from cs.DB) [ pdf , ps , other ]
-
Title: Rastro-DM: data mining with a trailComments: It was published in the Brazilian Federal Court of Accounts Journal n. 145 on 2021 ( this https URL )Journal-ref: Revista do TCU (Brazilian Federal Court of Accounts), 145 (2021): 79-106Subjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper proposes a methodology for documenting data mining (DM) projects, Rastro-DM (Trail Data Mining), with a focus not on the model that is generated, but on the processes behind its construction, in order to leave a trail (Rastro in Portuguese) of planned actions, training completed, results obtained, and lessons learned. The proposed practices are complementary to structuring methodologies of DM, such as CRISP-DM, which establish a methodological and paradigmatic framework for the DM process. The application of best practices and their benefits is illustrated in a project called 'Cladop' that was created for the classification of PDF documents associated with the investigative process of damages to the Brazilian Federal Public Treasury. Building the Rastro-DM kit in the context of a project is a small step that can lead to an institutional leap to be achieved by sharing and using the trail across the enterprise.
- [546] arXiv:2401.03955 (cross-list from cs.LG) [ pdf , other ]
-
Title: Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time SeriesAuthors: Vijay Ekambaram , Arindam Jati , Nam H. Nguyen , Pankaj Dayama , Chandra Reddy , Wesley M. Gifford , Jayant KalagnanamSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Large pre-trained models for zero/few-shot learning excel in language and vision domains but encounter challenges in multivariate time series (TS) due to the diverse nature and scarcity of publicly available pre-training data. Consequently, there has been a recent surge in utilizing pre-trained large language models (LLMs) with token adaptations for TS forecasting. These approaches employ cross-domain transfer learning and surprisingly yield impressive results. However, these models are typically very slow and large (~billion parameters) and do not consider cross-channel correlations. To address this, we present Tiny Time Mixers (TTM), a significantly small model based on the lightweight TSMixer architecture. TTM marks the first success in developing fast and tiny general pre-trained models (<1M parameters), exclusively trained on public TS datasets, with effective transfer learning capabilities for forecasting. To tackle the complexity of pre-training on multiple datasets with varied temporal resolutions, we introduce several novel enhancements such as adaptive patching, dataset augmentation via downsampling, and resolution prefix tuning. Moreover, we employ a multi-level modeling strategy to effectively model channel correlations and infuse exogenous signals during fine-tuning, a crucial capability lacking in existing benchmarks. TTM shows significant accuracy gains (12-38\%) over popular benchmarks in few/zero-shot forecasting. It also drastically reduces the compute needs as compared to LLM-TS methods, with a 14X cut in learnable parameters, 106X less total parameters, and substantial reductions in fine-tuning (65X) and inference time (54X). In fact, TTM's zero-shot often surpasses the few-shot results in many popular benchmarks, highlighting the efficacy of our approach. Models and source code are available at this https URL
- [547] arXiv:2401.03988 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Primer on Temporal Graph LearningComments: 19 pages, 47 equationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Social and Information Networks (cs.SI); Signal Processing (eess.SP)
Abstract: This document aims to familiarize readers with temporal graph learning (TGL) through a concept-first approach. We have systematically presented vital concepts essential for understanding the workings of a TGL framework. In addition to qualitative explanations, we have incorporated mathematical formulations where applicable, enhancing the clarity of the text. Since TGL involves temporal and spatial learning, we introduce relevant learning architectures ranging from recurrent and convolutional neural networks to transformers and graph neural networks. We also discuss classical time series forecasting methods to inspire interpretable learning solutions for TGL.
- [548] arXiv:2401.03993 (cross-list from cs.LG) [ pdf , other ]
-
Title: Behavioural Cloning in VizDoomComments: 13 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: This paper describes methods for training autonomous agents to play the game "Doom 2" through Imitation Learning (IL) using only pixel data as input. We also explore how Reinforcement Learning (RL) compares to IL for humanness by comparing camera movement and trajectory data. Through behavioural cloning, we examine the ability of individual models to learn varying behavioural traits. We attempt to mimic the behaviour of real players with different play styles, and find we can train agents that behave aggressively, passively, or simply more human-like than traditional AIs. We propose these methods of introducing more depth and human-like behaviour to agents in video games. The trained IL agents perform on par with the average players in our dataset, whilst outperforming the worst players. While performance was not as strong as common RL approaches, it provides much stronger human-like behavioural traits to the agent.
- [549] arXiv:2401.03999 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Polynomial Precision Dependence Solutions to Alignment Research Center Matrix Completion ProblemsAuthors: Rico AngellSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We present solutions to the matrix completion problems proposed by the Alignment Research Center that have a polynomial dependence on the precision $\varepsilon$. The motivation for these problems is to enable efficient computation of heuristic estimators to formally evaluate and reason about different quantities of deep neural networks in the interest of AI alignment. Our solutions involve reframing the matrix completion problems as a semidefinite program (SDP) and using recent advances in spectral bundle methods for fast, efficient, and scalable SDP solving.
- [550] arXiv:2401.04003 (cross-list from cs.RO) [ pdf , other ]
-
Title: Simultaneous Task Allocation and Planning for Multi-Robots under Hierarchical Temporal Logic SpecificationsComments: 20 pages, 7 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL)
Abstract: Past research into robotic planning with temporal logic specifications, notably Linear Temporal Logic (LTL), was largely based on singular formulas for individual or groups of robots. But with increasing task complexity, LTL formulas unavoidably grow lengthy, complicating interpretation and specification generation, and straining the computational capacities of the planners. By leveraging the intrinsic structure of tasks, we introduced a hierarchical structure to LTL specifications with requirements on syntax and semantics, and proved that they are more expressive than their flat counterparts. Second, we employ a search-based approach to synthesize plans for a multi-robot system, accomplishing simultaneous task allocation and planning. The search space is approximated by loosely interconnected sub-spaces, with each sub-space corresponding to one LTL specification. The search is predominantly confined to a single sub-space, transitioning to another sub-space under certain conditions, determined by the decomposition of automatons. Moreover, multiple heuristics are formulated to expedite the search significantly. A theoretical analysis concerning completeness and optimality is conducted under mild assumptions. When compared with existing methods on service tasks, our method outperforms in terms of execution times with comparable solution quality. Finally, scalability is evaluated by testing a group of 30 robots and achieving reasonable runtimes.
- [551] arXiv:2401.04023 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video ClassificationAuthors: Wentao ZhuComments: Accepted by WACV 2024; well-formatted PDF is in this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues. However, how to effectively leverage the two modalities is still under development. In this work, we develop a multiscale multimodal Transformer (MMT) that leverages hierarchical representation learning. Particularly, MMT is composed of a novel multiscale audio Transformer (MAT) and a multiscale video Transformer [43]. To learn a discriminative cross-modality fusion, we further design multimodal supervised contrastive objectives called audio-video contrastive loss (AVC) and intra-modal contrastive loss (IMC) that robustly align the two modalities. MMT surpasses previous state-of-the-art approaches by 7.3% and 2.1% on Kinetics-Sounds and VGGSound in terms of the top-1 accuracy without external training data. Moreover, the proposed MAT significantly outperforms AST [28] by 22.2%, 4.4% and 4.7% on three public benchmark datasets, and is about 3% more efficient based on the number of FLOPs and 9.8% more efficient based on GPU memory usage.
- [552] arXiv:2401.04057 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Unveiling Bias in Fairness Evaluations of Large Language Models: A Critical Literature Review of Music and Movie Recommendation SystemsComments: 10 pagesSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: The rise of generative artificial intelligence, particularly Large Language Models (LLMs), has intensified the imperative to scrutinize fairness alongside accuracy. Recent studies have begun to investigate fairness evaluations for LLMs within domains such as recommendations. Given that personalization is an intrinsic aspect of recommendation systems, its incorporation into fairness assessments is paramount. Yet, the degree to which current fairness evaluation frameworks account for personalization remains unclear. Our comprehensive literature review aims to fill this gap by examining how existing frameworks handle fairness evaluations of LLMs, with a focus on the integration of personalization factors. Despite an exhaustive collection and analysis of relevant works, we discovered that most evaluations overlook personalization, a critical facet of recommendation systems, thereby inadvertently perpetuating unfair practices. Our findings shed light on this oversight and underscore the urgent need for more nuanced fairness evaluations that acknowledge personalization. Such improvements are vital for fostering equitable development within the AI community.
- [553] arXiv:2401.04081 (cross-list from cs.LG) [ pdf , other ]
-
Title: MoE-Mamba: Efficient Selective State Space Models with Mixture of ExpertsAuthors: Maciej Pióro , Kamil Ciebiera , Krystian Król , Jan Ludziejewski , Michał Krutul , Jakub Krajewski , Szymon Antoniak , Piotr Miłoś , Marek Cygan , Sebastian JaszczurSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based Large Language Models, including recent state-of-the-art open models. We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable performance. Our model, MoE-Mamba, outperforms both Mamba and baseline Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in $2.35\times$ fewer training steps while preserving the inference performance gains of Mamba against Transformer.
- [554] arXiv:2401.04105 (cross-list from cs.CV) [ pdf , other ]
-
Title: Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient FinetuningAuthors: Chen Zhao , Shuming Liu , Karttikeya Mangalam , Guocheng Qian , Fatimah Zohra , Abdulmohsen Alghannam , Jitendra Malik , Bernard GhanemJournal-ref: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Large pretrained models are increasingly crucial in modern computer vision tasks. These models are typically used in downstream tasks by end-to-end finetuning, which is highly memory-intensive for tasks with high-resolution data, e.g., video understanding, small object detection, and point cloud analysis. In this paper, we propose Dynamic Reversible Dual-Residual Networks, or Dr$^2$Net, a novel family of network architectures that acts as a surrogate network to finetune a pretrained model with substantially reduced memory consumption. Dr$^2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. Due to its reversibility, intermediate activations, which can be reconstructed from output, are cleared from memory during training. We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision. We evaluate Dr$^2$Net on various pretrained models and various tasks, and show that it can reach comparable performance to conventional finetuning but with significantly less memory usage.
- [555] arXiv:2401.04108 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Working with Trouble and Failures in Conversation between Humans and Robots (WTF 2023) & Is CUI Design Ready Yet?Authors: Frank Förster , Marta Romeo , Patrick Holthaus , Maria Jose Galvez Trigo , Joel E. Fischer , Birthe Nesset , Christian Dondrup , Christine Murad , Cosmin Munteanu , Benjamin R. Cowan , Leigh Clark , Martin Porcheron , Heloisa Candello , Raina LangevinComments: WTF 2023 & 'Is CUI Design Ready Yet?' workshop proceedings including 10 extended abstracts and articlesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Robotics (cs.RO)
Abstract: Workshop proceedings of two co-located workshops "Working with Troubles and Failures in Conversation with Humans and Robots" (WTF 2023) and "Is CUI Design Ready Yet?", both of which were part of the ACM conference on conversational user interfaces 2023.
WTF 23 aimed at bringing together researchers from human-robot interaction, dialogue systems, human-computer interaction, and conversation analysis. Despite all progress, robotic speech interfaces continue to be brittle in a number of ways and the experience of failure of such interfaces is commonplace amongst roboticists. However, the technical literature is positively skewed toward their good performance. The workshop aims to provide a platform for discussing communicative troubles and failures in human-robot interactions and related failures in non-robotic speech interfaces. Aims include a scrupulous investigation into communicative failures, to begin working on a taxonomy of such failures, and enable a preliminary discussion on possible mitigating strategies. Workshop website: this https URL
Is CUI Design Ready Yet? As CUIs become more prevalent in both academic research and the commercial market, it becomes more essential to design usable and adoptable CUIs. While research has been growing on the methods for designing CUIs for commercial use, there has been little discussion on the overall community practice of developing design resources to aid in practical CUI design. The aim of this workshop, therefore, is to bring the CUI community together to discuss the current practices for developing tools and resources for practical CUI design, the adoption (or non-adoption) of these tools and resources, and how these resources are utilized in the training and education of new CUI designers entering the field. Workshop website: this https URL - [556] arXiv:2401.04119 (cross-list from cs.HC) [ pdf , other ]
-
Title: Why is the User Interface a Dark Pattern? : Explainable Auto-Detection and its AnalysisComments: IEEE International Conference on Big Data (IEEE BigData 2022)Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Dark patterns are deceptive user interface designs for online services that make users behave in unintended ways. Dark patterns, such as privacy invasion, financial loss, and emotional distress, can harm users. These issues have been the subject of considerable debate in recent years. In this paper, we study interpretable dark pattern auto-detection, that is, why a particular user interface is detected as having dark patterns. First, we trained a model using transformer-based pre-trained language models, BERT, on a text-based dataset for the automatic detection of dark patterns in e-commerce. Then, we applied post-hoc explanation techniques, including local interpretable model agnostic explanation (LIME) and Shapley additive explanations (SHAP), to the trained model, which revealed which terms influence each prediction as a dark pattern. In addition, we extracted and analyzed terms that affected the dark patterns. Our findings may prevent users from being manipulated by dark patterns, and aid in the construction of more equitable internet services. Our code is available at this https URL .
- [557] arXiv:2401.04120 (cross-list from cs.HC) [ pdf , other ]
-
Title: Generation Z's Ability to Discriminate Between AI-generated and Human-Authored Text on DiscordSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The growing popularity of generative artificial intelligence (AI) chatbots such as ChatGPT is having transformative effects on social media. As the prevalence of AI-generated content grows, concerns have been raised regarding privacy and misinformation online. Among social media platforms, Discord enables AI integrations -- making their primarily "Generation Z" userbase particularly exposed to AI-generated content. We surveyed Generation Z aged individuals (n = 335) to evaluate their proficiency in discriminating between AI-generated and human-authored text on Discord. The investigation employed one-shot prompting of ChatGPT, disguised as a text message received on the this http URL platform. We explore the influence of demographic factors on ability, as well as participants' familiarity with Discord and artificial intelligence technologies. We find that Generation Z individuals are unable to discern between AI and human-authored text (p = 0.011), and that those with lower self-reported familiarity with Discord demonstrated an improved ability in identifying human-authored compared to those with self-reported experience with AI (p << 0.0001). Our results suggest that there is a nuanced relationship between AI technology and popular modes of communication for Generation Z, contributing valuable insights into human-computer interactions, digital communication, and artificial intelligence literacy.
- [558] arXiv:2401.04122 (cross-list from cs.HC) [ pdf , other ]
-
Title: From Prompt Engineering to Prompt Science With Human in the LoopAuthors: Chirag ShahSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: As LLMs make their way into many aspects of our lives, one place that warrants increased scrutiny with LLM usage is scientific research. Using LLMs for generating or analyzing data for research purposes is gaining popularity. But when such application is marred with ad-hoc decisions and engineering solutions, we need to be concerned about how it may affect that research, its findings, or any future works based on that research. We need a more scientific approach to using LLMs in our research. While there are several active efforts to support more systematic construction of prompts, they are often focused more on achieving desirable outcomes rather than producing replicable and generalizable knowledge with sufficient transparency, objectivity, or rigor. This article presents a new methodology inspired by codebook construction through qualitative methods to address that. Using humans in the loop and a multi-phase verification processes, this methodology lays a foundation for more systematic, objective, and trustworthy way of applying LLMs for analyzing data. Specifically, we show how a set of researchers can work through a rigorous process of labeling, deliberating, and documenting to remove subjectivity and bring transparency and replicability to prompt generation process.
- [559] arXiv:2401.04124 (cross-list from cs.HC) [ pdf , other ]
-
Title: MobileAgent: enhancing mobile control via human-machine interaction and SOP integrationAuthors: Tinghe DingComments: agent, mobile control, SOP, human-machine interactionSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Agents centered around Large Language Models (LLMs) are now capable of automating mobile device operations for users. After fine-tuning to learn a user's mobile operations, these agents can adhere to high-level user instructions online. They execute tasks such as goal decomposition, sequencing of sub-goals, and interactive environmental exploration, until the final objective is achieved. However, privacy concerns related to personalized user data arise during mobile operations, requiring user confirmation. Moreover, users' real-world operations are exploratory, with action data being complex and redundant, posing challenges for agent learning. To address these issues, in our practical application, we have designed interactive tasks between agents and humans to identify sensitive information and align with personalized user needs. Additionally, we integrated Standard Operating Procedure (SOP) information within the model's in-context learning to enhance the agent's comprehension of complex task execution. Our approach is evaluated on the new device control benchmark AitW, which encompasses 30K unique instructions across multi-step tasks, including application operation, web searching, and web shopping. Experimental results show that the SOP-based agent achieves state-of-the-art performance in LLMs without incurring additional inference costs, boasting an overall action success rate of 66.92\%. The code and data examples are available at this https URL .
- [560] arXiv:2401.04126 (cross-list from cs.HC) [ src ]
-
Title: The Concept of the Tactile Signature System for Individuals with Visual ImpairmentsComments: The principles of connection between the interaction of a blind individual with the tactile field, when he uses touch, drawing various figures and forms, and the resulting prompts to the user to create a valid signature, have not been disclosedSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: The lack of an accessible and effective system for blind individuals to create handwritten signatures presents a significant barrier to their independence and full participation in various aspects of life. This research introduces the Tactile Signature System, a groundbreaking approach that empowers individuals with visual impairments to form their unique handwritten signatures. Key features of the system include: Personalized customization: Through tactile interaction and voice algorithmic guidance, individuals create signatures reflecting their preferences and natural writing style. Real-time feedback: AI-powered voice prompts and analysis ensure accuracy and consistency in signature formation. Accessibility: Installation in local service centers provides a secure and supervised environment for signature creation. The system's impact reaches beyond the individual level: Promotes inclusivity and independence: Blind individuals can engage in legal and financial transactions without relying on others. Empowers and fosters equal opportunities: Participation in education, employment, and civic engagement becomes more accessible. Aligns with international conventions: Upholds the right of persons with disabilities to participate fully in society. The Tactile Signature System represents a significant step towards an inclusive and accessible future for individuals with visual impairments.
- [561] arXiv:2401.04130 (cross-list from cs.LG) [ pdf , other ]
-
Title: Plug-and-Play Transformer Modules for Test-Time AdaptationAuthors: Xiangyu Chang , Sk Miraj Ahmed , Srikanth V. Krishnamurthy , Basak Guler , Ananthram Swami , Samet Oymak , Amit K. Roy-ChowdhurySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation.
- [562] arXiv:2401.04133 (cross-list from cs.LG) [ pdf , other ]
-
Title: SynHIN: Generating Synthetic Heterogeneous Information Network for Explainable AISubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Graph Neural Networks (GNNs) excel in various domains, from detecting e-commerce spam to social network classification problems. However, the lack of public graph datasets hampers research progress, particularly in heterogeneous information networks (HIN). The demand for datasets for fair HIN comparisons is growing due to advancements in GNN interpretation models. In response, we propose SynHIN, a unique method for generating synthetic heterogeneous information networks. SynHIN identifies motifs in real-world datasets, summarizes graph statistics, and constructs a synthetic network. Our approach utilizes In-Cluster and Out-Cluster Merge modules to build the synthetic HIN from primary motif clusters. After In/Our-Cluster mergers and a post-pruning process fitting the real dataset constraints, we ensure the synthetic graph statistics align closely with the reference one. SynHIN generates a synthetic heterogeneous graph dataset for node classification tasks, using the primary motif as the explanation ground truth. It can adapt and address the lack of heterogeneous graph datasets and motif ground truths, proving beneficial for assessing heterogeneous graph neural network explainers. We further present a benchmark dataset for future heterogeneous graph explainer model research. Our work marks a significant step towards explainable AI in HGNNs.
- [563] arXiv:2401.04134 (cross-list from cs.NE) [ pdf , other ]
-
Title: Web Neural Network with Complete DiGraphsAuthors: Frank LiSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces a new neural network model that aims to mimic the biological brain more closely by structuring the network as a complete directed graph that processes continuous data for each timestep. Current neural networks have structures that vaguely mimic the brain structure, such as neurons, convolutions, and recurrence. The model proposed in this paper adds additional structural properties by introducing cycles into the neuron connections and removing the sequential nature commonly seen in other network layers. Furthermore, the model has continuous input and output, inspired by spiking neural networks, which allows the network to learn a process of classification, rather than simply returning the final result.
- [564] arXiv:2401.04135 (cross-list from cs.LG) [ pdf , other ]
-
Title: Global-Aware Enhanced Spatial-Temporal Graph Recurrent Networks: A New Framework For Traffic Flow PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Traffic flow prediction plays a crucial role in alleviating traffic congestion and enhancing transport efficiency. While combining graph convolution networks with recurrent neural networks for spatial-temporal modeling is a common strategy in this realm, the restricted structure of recurrent neural networks limits their ability to capture global information. For spatial modeling, many prior studies learn a graph structure that is assumed to be fixed and uniform at all time steps, which may not be true. This paper introduces a novel traffic prediction framework, Global-Aware Enhanced Spatial-Temporal Graph Recurrent Network (GA-STGRN), comprising two core components: a spatial-temporal graph recurrent neural network and a global awareness layer. Within this framework, three innovative prediction models are formulated. A sequence-aware graph neural network is proposed and integrated into the Gated Recurrent Unit (GRU) to learn non-fixed graphs at different time steps and capture local temporal relationships. To enhance the model's global perception, three distinct global spatial-temporal transformer-like architectures (GST^2) are devised for the global awareness layer. We conduct extensive experiments on four real traffic datasets and the results demonstrate the superiority of our framework and the three concrete models.
- [565] arXiv:2401.04136 (cross-list from cs.CR) [ pdf , other ]
-
Title: The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright Breaches Without Adjusting Finetuning PipelineComments: This study reveals that by subtly inserting non-copyright-infringing poisoning data into a diffusion model's training dataset, it's possible to trigger the model to generate copyrighted content, highlighting vulnerabilities in current copyright protection strategiesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The commercialization of diffusion models, renowned for their ability to generate high-quality images that are often indistinguishable from real ones, brings forth potential copyright concerns. Although attempts have been made to impede unauthorized access to copyrighted material during training and to subsequently prevent DMs from generating copyrighted images, the effectiveness of these solutions remains unverified. This study explores the vulnerabilities associated with copyright protection in DMs by introducing a backdoor data poisoning attack (SilentBadDiffusion) against text-to-image diffusion models. Our attack method operates without requiring access to or control over the diffusion model's training or fine-tuning processes; it merely involves the insertion of poisoning data into the clean training dataset. This data, comprising poisoning images equipped with prompts, is generated by leveraging the powerful capabilities of multimodal large language models and text-guided image inpainting techniques. Our experimental results and analysis confirm the method's effectiveness. By integrating a minor portion of non-copyright-infringing stealthy poisoning data into the clean dataset-rendering it free from suspicion-we can prompt the finetuned diffusion models to produce copyrighted content when activated by specific trigger prompts. These findings underline potential pitfalls in the prevailing copyright protection strategies and underscore the necessity for increased scrutiny and preventative measures against the misuse of DMs.
- [566] arXiv:2401.04138 (cross-list from cs.HC) [ pdf , other ]
-
Title: Expanding Horizons in HCI Research Through LLM-Driven Qualitative AnalysisSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: How would research be like if we still needed to "send" papers typed with a typewriter? Our life and research environment have continually evolved, often accompanied by controversial opinions about new methodologies. In this paper, we embrace this change by introducing a new approach to qualitative analysis in HCI using Large Language Models (LLMs). We detail a method that uses LLMs for qualitative data analysis and present a quantitative framework using SBART cosine similarity for performance evaluation. Our findings indicate that LLMs not only match the efficacy of traditional analysis methods but also offer unique insights. Through a novel dataset and benchmark, we explore LLMs' characteristics in HCI research, suggesting potential avenues for further exploration and application in the field.
- [567] arXiv:2401.04141 (cross-list from cs.LG) [ pdf , other ]
-
Title: On The Potential of The Fractal Geometry and The CNNs Ability to Encode itSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The fractal dimension provides a statistical index of object complexity by studying how the pattern changes with the measuring scale. Although useful in several classification tasks, the fractal dimension is under-explored in deep learning applications. In this work, we investigate the features that are learned by deep models and we study whether these deep networks are able to encode features as complex and high-level as the fractal dimensions. Specifically, we conduct a correlation analysis experiment to show that deep networks are not able to extract such a feature in none of their layers. We combine our analytical study with a human evaluation to investigate the differences between deep learning networks and models that operate on the fractal feature solely. Moreover, we show the effectiveness of fractal features in applications where the object structure is crucial for the classification task. We empirically show that training a shallow network on fractal features achieves performance comparable, even superior in specific cases, to that of deep networks trained on raw data while requiring less computational resources. Fractals improved the accuracy of the classification by 30% on average while requiring up to 84% less time to train. We couple our empirical study with a complexity analysis of the computational cost of extracting the proposed fractal features, and we study its limitation.
- [568] arXiv:2401.04144 (cross-list from cs.LG) [ pdf , other ]
-
Title: Robust Calibration For Improved Weather Prediction Under Distributional ShiftComments: Presented at the Bayesian Deep Learning workshop at NeurIPS 2021Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we present results on improving out-of-domain weather prediction and uncertainty estimation as part of the \texttt{Shifts Challenge on Robustness and Uncertainty under Real-World Distributional Shift} challenge. We find that by leveraging a mixture of experts in conjunction with an advanced data augmentation technique borrowed from the computer vision domain, in conjunction with robust \textit{post-hoc} calibration of predictive uncertainties, we can potentially achieve more accurate and better-calibrated results with deep neural networks than with boosted tree models for tabular data. We quantify our predictions using several metrics and propose several future lines of inquiry and experimentation to boost performance.
- [569] arXiv:2401.04145 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learn Once Plan Arbitrarily (LOPA): Attention-Enhanced Deep Reinforcement Learning Method for Global Path PlanningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Deep reinforcement learning (DRL) methods have recently shown promise in path planning tasks. However, when dealing with global planning tasks, these methods face serious challenges such as poor convergence and generalization. To this end, we propose an attention-enhanced DRL method called LOPA (Learn Once Plan Arbitrarily) in this paper. Firstly, we analyze the reasons of these problems from the perspective of DRL's observation, revealing that the traditional design causes DRL to be interfered by irrelevant map information. Secondly, we develop the LOPA which utilizes a novel attention-enhanced mechanism to attain an improved attention capability towards the key information of the observation. Such a mechanism is realized by two steps: (1) an attention model is built to transform the DRL's observation into two dynamic views: local and global, significantly guiding the LOPA to focus on the key information on the given maps; (2) a dual-channel network is constructed to process these two views and integrate them to attain an improved reasoning capability. The LOPA is validated via multi-objective global path planning experiments. The result suggests the LOPA has improved convergence and generalization performance as well as great path planning efficiency.
- [570] arXiv:2401.04148 (cross-list from cs.LG) [ pdf , other ]
-
Title: Online Test-Time Adaptation of Spatial-Temporal Traffic Flow ForecastingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Accurate spatial-temporal traffic flow forecasting is crucial in aiding traffic managers in implementing control measures and assisting drivers in selecting optimal travel routes. Traditional deep-learning based methods for traffic flow forecasting typically rely on historical data to train their models, which are then used to make predictions on future data. However, the performance of the trained model usually degrades due to the temporal drift between the historical and future data. To make the model trained on historical data better adapt to future data in a fully online manner, this paper conducts the first study of the online test-time adaptation techniques for spatial-temporal traffic flow forecasting problems. To this end, we propose an Adaptive Double Correction by Series Decomposition (ADCSD) method, which first decomposes the output of the trained model into seasonal and trend-cyclical parts and then corrects them by two separate modules during the testing phase using the latest observed data entry by entry. In the proposed ADCSD method, instead of fine-tuning the whole trained model during the testing phase, a lite network is attached after the trained model, and only the lite network is fine-tuned in the testing process each time a data entry is observed. Moreover, to satisfy that different time series variables may have different levels of temporal drift, two adaptive vectors are adopted to provide different weights for different time series variables. Extensive experiments on four real-world traffic flow forecasting datasets demonstrate the effectiveness of the proposed ADCSD method. The code is available at this https URL .
- [571] arXiv:2401.04152 (cross-list from cs.SD) [ pdf , other ]
-
Title: Cross-Speaker Encoding Network for Multi-Talker Speech RecognitionComments: Accepted by ICASSP2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Abstract: End-to-end multi-talker speech recognition has garnered great interest as an effective approach to directly transcribe overlapped speech from multiple speakers. Current methods typically adopt either 1) single-input multiple-output (SIMO) models with a branched encoder, or 2) single-input single-output (SISO) models based on attention-based encoder-decoder architecture with serialized output training (SOT). In this work, we propose a Cross-Speaker Encoding (CSE) network to address the limitations of SIMO models by aggregating cross-speaker representations. Furthermore, the CSE model is integrated with SOT to leverage both the advantages of SIMO and SISO while mitigating their drawbacks. To the best of our knowledge, this work represents an early effort to integrate SIMO and SISO for multi-talker speech recognition. Experiments on the two-speaker LibrispeechMix dataset show that the CES model reduces word error rate (WER) by 8% over the SIMO baseline. The CSE-SOT model reduces WER by 10% overall and by 16% on high-overlap speech compared to the SOT model.
- [572] arXiv:2401.04154 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video ClassificationAuthors: Wentao ZhuComments: Accepted by WACV 2024; well-formatted PDF is in this https URL arXiv admin note: text overlap with arXiv:2401.04023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Audio and video are two most common modalities in the mainstream media platforms, e.g., YouTube. To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy. For multimodal fusion, simply concatenating multimodal tokens in a cross-modal Transformer requires large computational and memory resources, instead we reduce the cross-modality complexity through an audio-video bottleneck Transformer. To improve the learning efficiency of multimodal Transformer, we integrate self-supervised objectives, i.e., audio-video contrastive learning, audio-video matching, and masked audio and video learning, into AVT training, which maps diverse audio and video representations into a common multimodal representation space. We further propose a masked audio segment loss to learn semantic audio activities in AVT. Extensive experiments and ablation studies on three public datasets and two in-house datasets consistently demonstrate the effectiveness of the proposed AVT. Specifically, AVT outperforms its previous state-of-the-art counterparts on Kinetics-Sounds by 8%. AVT also surpasses one of the previous state-of-the-art video Transformers [25] by 10% on VGGSound by leveraging the audio signal. Compared to one of the previous state-of-the-art multimodal methods, MBT [32], AVT is 1.3% more efficient in terms of FLOPs and improves the accuracy by 3.8% on Epic-Kitchens-100.
- [573] arXiv:2401.04192 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Interactive Multi-Objective Evolutionary Optimization of Software ArchitecturesComments: 41 pages, 5 figures, journal "Information Sciences"Journal-ref: Information Sciences, vol. 463-464, pp. 92-109, 2018Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: While working on a software specification, designers usually need to evaluate different architectural alternatives to be sure that quality criteria are met. Even when these quality aspects could be expressed in terms of multiple software metrics, other qualitative factors cannot be numerically measured, but they are extracted from the engineer's know-how and prior experiences. In fact, detecting not only strong but also weak points in the different solutions seems to fit better with the way humans make their decisions. Putting the human in the loop brings new challenges to the search-based software engineering field, especially for those human-centered activities within the early analysis phase. This paper explores how the interactive evolutionary computation can serve as a basis for integrating the human's judgment into the search process. An interactive approach is proposed to discover software architectures, in which both quantitative and qualitative criteria are applied to guide a multi-objective evolutionary algorithm. The obtained feedback is incorporated into the fitness function using architectural preferences allowing the algorithm to discern between promising and poor solutions. Experimentation with real users has revealed that the proposed interaction mechanism can effectively guide the search towards those regions of the search space that are of real interest to the expert.
- [574] arXiv:2401.04198 (cross-list from cs.LG) [ pdf , other ]
-
Title: Curiosity & Entropy Driven Unsupervised RL in Multiple EnvironmentsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The authors of 'Unsupervised Reinforcement Learning in Multiple environments' propose a method, alpha-MEPOL, to tackle unsupervised RL across multiple environments. They pre-train a task-agnostic exploration policy using interactions from an entire environment class and then fine-tune this policy for various tasks using supervision. We expanded upon this work, with the goal of improving performance. We primarily propose and experiment with five new modifications to the original work: sampling trajectories using an entropy-based probability distribution, dynamic alpha, higher KL Divergence threshold, curiosity-driven exploration, and alpha-percentile sampling on curiosity. Dynamic alpha and higher KL-Divergence threshold both provided a significant improvement over the baseline from the earlier work. PDF-sampling failed to provide any improvement due to it being approximately equivalent to the baseline method when the sample space is small. In high-dimensional environments, the addition of curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more. However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent. Overall, some of our experiments did boost performance over the baseline and there are a few directions that seem promising for further research.
- [575] arXiv:2401.04206 (cross-list from cs.HC) [ pdf , other ]
-
Title: Effects of Multimodal Explanations for Autonomous Driving on Driving Performance, Cognitive Load, Expertise, Confidence, and TrustComments: 16 pagesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Advances in autonomous driving provide an opportunity for AI-assisted driving instruction that directly addresses the critical need for human driving improvement. How should an AI instructor convey information to promote learning? In a pre-post experiment (n = 41), we tested the impact of an AI Coach's explanatory communications modeled after performance driving expert instructions. Participants were divided into four (4) groups to assess two (2) dimensions of the AI coach's explanations: information type ('what' and 'why'-type explanations) and presentation modality (auditory and visual). We compare how different explanatory techniques impact driving performance, cognitive load, confidence, expertise, and trust via observational learning. Through interview, we delineate participant learning processes. Results show AI coaching can effectively teach performance driving skills to novices. We find the type and modality of information influences performance outcomes. Differences in how successfully participants learned are attributed to how information directs attention, mitigates uncertainty, and influences overload experienced by participants. Results suggest efficient, modality-appropriate explanations should be opted for when designing effective HMI communications that can instruct without overwhelming. Further, results support the need to align communications with human learning and cognitive processes. We provide eight design implications for future autonomous vehicle HMI and AI coach design.
- [576] arXiv:2401.04210 (cross-list from cs.CV) [ pdf , other ]
-
Title: FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the WildComments: 22 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Automatically understanding funny moments (i.e., the moments that make people laugh) when watching comedy is challenging, as they relate to various features, such as body language, dialogues and culture. In this paper, we propose FunnyNet-W, a model that relies on cross- and self-attention for visual, audio and text data to predict funny moments in videos. Unlike most methods that rely on ground truth data in the form of subtitles, in this work we exploit modalities that come naturally with videos: (a) video frames as they contain visual information indispensable for scene understanding, (b) audio as it contains higher-level cues associated with funny moments, such as intonation, pitch and pauses and (c) text automatically extracted with a speech-to-text model as it can provide rich information when processed by a Large Language Model. To acquire labels for training, we propose an unsupervised approach that spots and labels funny audio moments. We provide experiments on five datasets: the sitcoms TBBT, MHD, MUStARD, Friends, and the TED talk UR-Funny. Extensive experiments and analysis show that FunnyNet-W successfully exploits visual, auditory and textual cues to identify funny moments, while our findings reveal FunnyNet-W's ability to predict funny moments in the wild. FunnyNet-W sets the new state of the art for funny moment detection with multimodal cues on all datasets with and without using ground truth information.
- [577] arXiv:2401.04247 (cross-list from cs.CV) [ pdf , other ]
-
Title: Robust Image Watermarking using Stable DiffusionAuthors: Lijun Zhang , Xiao Liu , Antoni Viros Martin , Cindy Xiong Bearfield , Yuriy Brun , Hui GuanComments: 15 pages, 14 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Watermarking images is critical for tracking image provenance and claiming ownership. With the advent of generative models, such as stable diffusion, able to create fake but realistic images, watermarking has become particularly important, e.g., to make generated images reliably identifiable. Unfortunately, the very same stable diffusion technology can remove watermarks injected using existing methods. To address this problem, we present a ZoDiac, which uses a pre-trained stable diffusion model to inject a watermark into the trainable latent space, resulting in watermarks that can be reliably detected in the latent vector, even when attacked. We evaluate ZoDiac on three benchmarks, MS-COCO, DiffusionDB, and WikiArt, and find that ZoDiac is robust against state-of-the-art watermark attacks, with a watermark detection rate over 98% and a false positive rate below 6.4%, outperforming state-of-the-art watermarking methods. Our research demonstrates that stable diffusion is a promising approach to robust watermarking, able to withstand even stable-diffusion-based attacks.
- [578] arXiv:2401.04290 (cross-list from cs.CV) [ pdf , other ]
-
Title: StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent EnvironmentsComments: Published in CVPR 23'Journal-ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Spatial reasoning tasks in multi-agent environments such as event prediction, agent type identification, or missing data imputation are important for multiple applications (e.g., autonomous surveillance over sensor networks and subtasks for reinforcement learning (RL)). StarCraft II game replays encode intelligent (and adversarial) multi-agent behavior and could provide a testbed for these tasks; however, extracting simple and standardized representations for prototyping these tasks is laborious and hinders reproducibility. In contrast, MNIST and CIFAR10, despite their extreme simplicity, have enabled rapid prototyping and reproducibility of ML methods. Following the simplicity of these datasets, we construct a benchmark spatial reasoning dataset based on StarCraft II replays that exhibit complex multi-agent behaviors, while still being as easy to use as MNIST and CIFAR10. Specifically, we carefully summarize a window of 255 consecutive game states to create 3.6 million summary images from 60,000 replays, including all relevant metadata such as game outcome and player races. We develop three formats of decreasing complexity: Hyperspectral images that include one channel for every unit type (similar to multispectral geospatial images), RGB images that mimic CIFAR10, and grayscale images that mimic MNIST. We show how this dataset can be used for prototyping spatial reasoning methods. All datasets, code for extraction, and code for dataset loading can be found at this https URL
- [579] arXiv:2401.04319 (cross-list from cs.CL) [ pdf , other ]
-
Title: Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMsAuthors: Junjie Wang , Dan Yang , Binbin Hu , Yue Shen , Ziqi Liu , Wen Zhang , Jinjie Gu , Zhiqiang ZhangComments: Under reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we explore a new way for user targeting, where non-expert marketers could select their target users solely given demands in natural language form. The key to this issue is how to transform natural languages into practical structured logical languages, i.e., the structured understanding of marketer demands. Considering the impressive natural language processing ability of large language models (LLMs), we try to leverage LLMs to solve this issue. Past research indicates that the reasoning ability of LLMs can be effectively enhanced through chain-of-thought (CoT) prompting. But existing methods still have some limitations: (1) Previous methods either use simple "Let's think step by step" spells or provide fixed examples in demonstrations without considering compatibility between prompts and questions, making LLMs ineffective in some complex reasoning tasks such as structured language transformation. (2) Previous methods are often implemented in closed-source models or excessively large models, which is not suitable in industrial practical scenarios. Based on these, we propose ARALLM (i.e., Analogical Reasoning Augmented Large Language Models) consisting of two modules: Analogical Reasoning based Prompting and Reasoning-Augmented Multi-Task Model Distillation.
- [580] arXiv:2401.04330 (cross-list from cs.CV) [ pdf , other ]
-
Title: BD-MSA: Body decouple VHR Remote Sensing Image Change Detection method guided by multi-scale feature information aggregationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The purpose of remote sensing image change detection (RSCD) is to detect differences between bi-temporal images taken at the same place. Deep learning has been extensively used to RSCD tasks, yielding significant results in terms of result recognition. However, due to the shooting angle of the satellite, the impacts of thin clouds, and certain lighting conditions, the problem of fuzzy edges in the change region in some remote sensing photographs cannot be properly handled using current RSCD algorithms. To solve this issue, we proposed a Body Decouple Multi-Scale by fearure Aggregation change detection (BD-MSA), a novel model that collects both global and local feature map information in the channel and space dimensions of the feature map during the training and prediction phases. This approach allows us to successfully extract the change region's boundary information while also divorcing the change region's main body from its boundary. Numerous studies have shown that the assessment metrics and evaluation effects of the model described in this paper on the publicly available datasets DSIFN-CD, S2Looking and WHU-CD are the best when compared to other models.
- [581] arXiv:2401.04331 (cross-list from cs.LG) [ pdf , other ]
-
Title: Coupling Graph Neural Networks with Fractional Order Continuous Dynamics: A Robustness StudyAuthors: Qiyu Kang , Kai Zhao , Yang Song , Yihang Xie , Yanan Zhao , Sijie Wang , Rui She , Wee Peng TayComments: in Proc. AAAI Conference on Artificial Intelligence, Vancouver, Canada, Feb. 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this work, we rigorously investigate the robustness of graph neural fractional-order differential equation (FDE) models. This framework extends beyond traditional graph neural (integer-order) ordinary differential equation (ODE) models by implementing the time-fractional Caputo derivative. Utilizing fractional calculus allows our model to consider long-term memory during the feature updating process, diverging from the memoryless Markovian updates seen in traditional graph neural ODE models. The superiority of graph neural FDE models over graph neural ODE models has been established in environments free from attacks or perturbations. While traditional graph neural ODE models have been verified to possess a degree of stability and resilience in the presence of adversarial attacks in existing literature, the robustness of graph neural FDE models, especially under adversarial conditions, remains largely unexplored. This paper undertakes a detailed assessment of the robustness of graph neural FDE models. We establish a theoretical foundation outlining the robustness characteristics of graph neural FDE models, highlighting that they maintain more stringent output perturbation bounds in the face of input and graph topology disturbances, compared to their integer-order counterparts. Our empirical evaluations further confirm the enhanced robustness of graph neural FDE models, highlighting their potential in adversarially robust applications.
- [582] arXiv:2401.04334 (cross-list from cs.RO) [ pdf , other ]
-
Title: Large Language Models for Robotics: Opportunities, Challenges, and PerspectivesAuthors: Jiaqi Wang , Zihao Wu , Yiwei Li , Hanqi Jiang , Peng Shu , Enze Shi , Huawen Hu , Chong Ma , Yiheng Liu , Xuhui Wang , Yincheng Yao , Xuan Liu , Huaqin Zhao , Zhengliang Liu , Haixing Dai , Lin Zhao , Bao Ge , Xiang Li , Tianming Liu , Shu ZhangSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have undergone significant expansion and have been increasingly integrated across various domains. Notably, in the realm of robot task planning, LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions. However, for embodied tasks, where robots interact with complex environments, text-only LLMs often face challenges due to a lack of compatibility with robotic visual perception. This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks. Additionally, we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions. Our results, based on diverse datasets, indicate that GPT-4V effectively enhances robot performance in embodied tasks. This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights toward bridging the gap in Human-Robot-Environment interaction.
- [583] arXiv:2401.04336 (cross-list from cs.LG) [ pdf , other ]
-
Title: Deep Efficient Private Neighbor Generation for Subgraph Federated LearningComments: Accepted to SDM 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Behemoth graphs are often fragmented and separately stored by multiple data owners as distributed subgraphs in many realistic applications. Without harming data privacy, it is natural to consider the subgraph federated learning (subgraph FL) scenario, where each local client holds a subgraph of the entire global graph, to obtain globally generalized graph mining models. To overcome the unique challenge of incomplete information propagation on local subgraphs due to missing cross-subgraph neighbors, previous works resort to the augmentation of local neighborhoods through the joint FL of missing neighbor generators and GNNs. Yet their technical designs have profound limitations regarding the utility, efficiency, and privacy goals of FL. In this work, we propose FedDEP to comprehensively tackle these challenges in subgraph FL. FedDEP consists of a series of novel technical designs: (1) Deep neighbor generation through leveraging the GNN embeddings of potential missing neighbors; (2) Efficient pseudo-FL for neighbor generation through embedding prototyping; and (3) Privacy protection through noise-less edge-local-differential-privacy. We analyze the correctness and efficiency of FedDEP, and provide theoretical guarantees on its privacy. Empirical results on four real-world datasets justify the clear benefits of proposed techniques.
- [584] arXiv:2401.04339 (cross-list from cs.CV) [ pdf , other ]
-
Title: Memory-Efficient Personalization using Quantized Diffusion ModelComments: 20 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The rise of billion-parameter diffusion models like Stable Diffusion XL, Imagen, and Dall-E3 markedly advances the field of generative AI. However, their large-scale nature poses challenges in fine-tuning and deployment due to high resource demands and slow inference speed. This paper ventures into the relatively unexplored yet promising realm of fine-tuning quantized diffusion models. We establish a strong baseline by customizing three models: PEQA for fine-tuning quantization parameters, Q-Diffusion for post-training quantization, and DreamBooth for personalization. Our analysis reveals a notable trade-off between subject and prompt fidelity within the baseline model. To address these issues, we introduce two strategies, inspired by the distinct roles of different timesteps in diffusion models: S1 optimizing a single set of fine-tuning parameters exclusively at selected intervals, and S2 creating multiple fine-tuning parameter sets, each specialized for different timestep intervals. Our approach not only enhances personalization but also upholds prompt fidelity and image quality, significantly outperforming the baseline qualitatively and quantitatively. The code will be made publicly available.
- [585] arXiv:2401.04351 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Change Point Detection Integrated Remaining Useful Life Estimation Model under Variable Operating ConditionsComments: Accepted in Control Engineering Practice Journal with DOI: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: By informing the onset of the degradation process, health status evaluation serves as a significant preliminary step for reliable remaining useful life (RUL) estimation of complex equipment. This paper proposes a novel temporal dynamics learning-based model for detecting change points of individual devices, even under variable operating conditions, and utilises the learnt change points to improve the RUL estimation accuracy. During offline model development, the multivariate sensor data are decomposed to learn fused temporal correlation features that are generalisable and representative of normal operation dynamics across multiple operating conditions. Monitoring statistics and control limit thresholds for normal behaviour are dynamically constructed from these learnt temporal features for the unsupervised detection of device-level change points. The detected change points then inform the degradation data labelling for training a long short-term memory (LSTM)-based RUL estimation model. During online monitoring, the temporal correlation dynamics of a query device is monitored for breach of the control limit derived in offline training. If a change point is detected, the device's RUL is estimated with the well-trained offline model for early preventive action. Using C-MAPSS turbofan engines as the case study, the proposed method improved the accuracy by 5.6\% and 7.5\% for two scenarios with six operating conditions, when compared to existing LSTM-based RUL estimation models that do not consider heterogeneous change points.
- [586] arXiv:2401.04357 (cross-list from cs.CV) [ pdf , other ]
-
Title: Iterative Feedback Network for Unsupervised Point Cloud RegistrationComments: 8 pages, accepted by RAL 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: As a fundamental problem in computer vision, point cloud registration aims to seek the optimal transformation for aligning a pair of point clouds. In most existing methods, the information flows are usually forward transferring, thus lacking the guidance from high-level information to low-level information. Besides, excessive high-level information may be overly redundant, and directly using it may conflict with the original low-level information. In this paper, we propose a novel Iterative Feedback Network (IFNet) for unsupervised point cloud registration, in which the representation of low-level features is efficiently enriched by rerouting subsequent high-level features. Specifically, our IFNet is built upon a series of Feedback Registration Block (FRB) modules, with each module responsible for generating the feedforward rigid transformation and feedback high-level features. These FRB modules are cascaded and recurrently unfolded over time. Further, the Feedback Transformer is designed to efficiently select relevant information from feedback high-level features, which is utilized to refine the low-level features. What's more, we incorporate a geometry-awareness descriptor to empower the network for making full use of most geometric information, which leads to more precise registration results. Extensive experiments on various benchmark datasets demonstrate the superior registration performance of our IFNet.
- [587] arXiv:2401.04361 (cross-list from cs.CL) [ pdf , other ]
-
Title: Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive LearningComments: Accepted by AAAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Knowledge-grounded dialogue (KGD) learns to generate an informative response based on a given dialogue context and external knowledge (\emph{e.g.}, knowledge graphs; KGs). Recently, the emergence of large language models (LLMs) and pre-training techniques has brought great success to knowledge-grounded dialogue. However, when building KGD systems in real applications, there are various real-world noises that are inevitable to face. For example, the dialogue context might involve perturbations such as misspellings and abbreviations. In addition, KGs typically suffer from incompletion and also might contain erroneous and outdated facts. Such real-world noises pose a challenge to the robustness of KGD systems and hinder their applications in the real world. In this paper, we propose an entity-based contrastive learning framework for improving the robustness of KGD. Specifically, we make use of the entity information in a KGD sample to create both its positive and negative samples which involve semantic-irrelevant and semantic-relevant perturbations, respectively. The contrastive learning framework ensures the KGD model is aware of these two types of perturbations, thus generating informative responses with the potentially noisy inputs in real applications. Experimental results on three benchmark datasets show that our method achieves new state-of-the-art performance in terms of automatic evaluation scores, verifying its effectiveness and potentiality. Furthermore, we show that our method can generate better responses than comparison models in both the noisy and the few-shot settings.
- [588] arXiv:2401.04362 (cross-list from cs.CV) [ pdf , other ]
-
Title: Representative Feature Extraction During Diffusion Process for Sketch Extraction with One ExampleComments: 8 pages(main paper), 8 pages(supplementary material)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: We introduce DiffSketch, a method for generating a variety of stylized sketches from images. Our approach focuses on selecting representative features from the rich semantics of deep features within a pretrained diffusion model. This novel sketch generation method can be trained with one manual drawing. Furthermore, efficient sketch extraction is ensured by distilling a trained generator into a streamlined extractor. We select denoising diffusion features through analysis and integrate these selected features with VAE features to produce sketches. Additionally, we propose a sampling scheme for training models using a conditional generative approach. Through a series of comparisons, we verify that distilled DiffSketch not only outperforms existing state-of-the-art sketch extraction methods but also surpasses diffusion-based stylization methods in the task of extracting sketches.
- [589] arXiv:2401.04385 (cross-list from cs.LG) [ pdf , other ]
-
Title: Machine unlearning through fine-grained model parameters perturbationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Machine unlearning techniques, which involve retracting data records and reducing influence of said data on trained models, help with the user privacy protection objective but incur significant computational costs. Weight perturbation-based unlearning is a general approach, but it typically involves globally modifying the parameters. We propose fine-grained Top-K and Random-k parameters perturbed inexact machine unlearning strategies that address the privacy needs while keeping the computational costs tractable.
In order to demonstrate the efficacy of our strategies we also tackle the challenge of evaluating the effectiveness of machine unlearning by considering the model's generalization performance across both unlearning and remaining data. To better assess the unlearning effect and model generalization, we propose novel metrics, namely, the forgetting rate and memory retention rate. However, for inexact machine unlearning, current metrics are inadequate in quantifying the degree of forgetting that occurs after unlearning strategies are applied. To address this, we introduce SPD-GAN, which subtly perturbs the distribution of data targeted for unlearning. Then, we evaluate the degree of unlearning by measuring the performance difference of the models on the perturbed unlearning data before and after the unlearning process. By implementing these innovative techniques and metrics, we achieve computationally efficacious privacy protection in machine learning applications without significant sacrifice of model performance. Furthermore, this approach provides a novel method for evaluating the degree of unlearning. - [590] arXiv:2401.04402 (cross-list from cs.LG) [ pdf , other ]
-
Title: IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health recordsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.
- [591] arXiv:2401.04405 (cross-list from cs.MM) [ pdf , other ]
-
Title: Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder EstimationComments: Accepted by the 2024 Data Compression Conference (DCC) for presentation as a poster. This is the full paperSubjects: Multimedia (cs.MM) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Abstract: Adaptive video streaming requires efficient bitrate ladder construction to meet heterogeneous network conditions and end-user demands. Per-title optimized encoding typically traverses numerous encoding parameters to search the Pareto-optimal operating points for each video. Recently, researchers have attempted to predict the content-optimized bitrate ladder for pre-encoding overhead reduction. However, existing methods commonly estimate the encoding parameters on the Pareto front and still require subsequent pre-encodings. In this paper, we propose to directly predict the optimal transcoding resolution at each preset bitrate for efficient bitrate ladder construction. We adopt a Temporal Attentive Gated Recurrent Network to capture spatial-temporal features and predict transcoding resolutions as a multi-task classification problem. We demonstrate that content-optimized bitrate ladders can thus be efficiently determined without any pre-encoding. Our method well approximates the ground-truth bitrate-resolution pairs with a slight Bjøntegaard Delta rate loss of 1.21% and significantly outperforms the state-of-the-art fixed ladder.
- [592] arXiv:2401.04422 (cross-list from cs.CL) [ pdf , other ]
-
Title: Estimating Text Similarity based on Semantic Concept EmbeddingsJournal-ref: IARIA Congress Proceedings, 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Due to their ease of use and high accuracy, Word2Vec (W2V) word embeddings enjoy great success in the semantic representation of words, sentences, and whole documents as well as for semantic similarity estimation. However, they have the shortcoming that they are directly extracted from a surface representation, which does not adequately represent human thought processes and also performs poorly for highly ambiguous words. Therefore, we propose Semantic Concept Embeddings (CE) based on the MultiNet Semantic Network (SN) formalism, which addresses both shortcomings. The evaluation on a marketing target group distribution task showed that the accuracy of predicted target groups can be increased by combining traditional word embeddings with semantic CEs.
- [593] arXiv:2401.04437 (cross-list from cs.CV) [ pdf , other ]
-
Title: Empirical Analysis of Anomaly Detection on Hyperspectral Imaging Using Dimension Reduction MethodsComments: 4 pages, 4 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent studies try to use hyperspectral imaging (HSI) to detect foreign matters in products because it enables to visualize the invisible wavelengths including ultraviolet and infrared. Considering the enormous image channels of the HSI, several dimension reduction methods-e.g., PCA or UMAP-can be considered to reduce but those cannot ease the fundamental limitations, as follows: (1) latency of HSI capturing. (2) less explanation ability of the important channels. In this paper, to circumvent the aforementioned methods, one of the ways to channel reduction, on anomaly detection proposed HSI. Different from feature extraction methods (i.e., PCA or UMAP), feature selection can sort the feature by impact and show better explainability so we might redesign the task-optimized and cost-effective spectroscopic camera. Via the extensive experiment results with synthesized MVTec AD dataset, we confirm that the feature selection method shows 6.90x faster at the inference phase compared with feature extraction-based approaches while preserving anomaly detection performance. Ultimately, we conclude the advantage of feature selection which is effective yet fast.
- [594] arXiv:2401.04441 (cross-list from cs.CV) [ pdf , other ]
-
Title: Image classification network enhancement methods based on knowledge injectionComments: 7 pages, 3 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The current deep neural network algorithm still stays in the end-to-end training supervision method like Image-Label pairs, which makes traditional algorithm is difficult to explain the reason for the results, and the prediction logic is difficult to understand and analyze. The current algorithm does not use the existing human knowledge information, which makes the model not in line with the human cognition model and makes the model not suitable for human use. In order to solve the above problems, the present invention provides a deep neural network training method based on the human knowledge, which uses the human cognition model to construct the deep neural network training model, and uses the existing human knowledge information to construct the deep neural network training model. This paper proposes a multi-level hierarchical deep learning algorithm, which is composed of multi-level hierarchical deep neural network architecture and multi-level hierarchical deep learning framework. The experimental results show that the proposed algorithm can effectively explain the hidden information of the neural network. The goal of our study is to improve the interpretability of deep neural networks (DNNs) by providing an analysis of the impact of knowledge injection on the classification task. We constructed a knowledge injection dataset with matching knowledge data and image classification data. The knowledge injection dataset is the benchmark dataset for the experiments in the paper. Our model expresses the improvement in interpretability and classification task performance of hidden layers at different scales.
- [595] arXiv:2401.04468 (cross-list from cs.CV) [ pdf , other ]
-
Title: MagicVideo-V2: Multi-Stage High-Aesthetic Video GenerationAuthors: Weimin Wang , Jiawei Liu , Zhijie Lin , Jiangqiao Yan , Shuo Chen , Chetwin Low , Tuyen Hoang , Jie Wu , Jun Hao Liew , Hanshu Yan , Daquan Zhou , Jiashi FengSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The growing demand for high-fidelity video generation from textual descriptions has catalyzed significant research in this field. In this work, we introduce MagicVideo-V2 that integrates the text-to-image model, video motion generator, reference image embedding module and frame interpolation module into an end-to-end video generation pipeline. Benefiting from these architecture designs, MagicVideo-V2 can generate an aesthetically pleasing, high-resolution video with remarkable fidelity and smoothness. It demonstrates superior performance over leading Text-to-Video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion model via user evaluation at large scale.
- [596] arXiv:2401.04472 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Survey on Efficient Federated Learning Methods for Foundation Model TrainingAuthors: Herbert Woisetschläger , Alexander Isenko , Shiqiang Wang , Ruben Mayer , Hans-Arno JacobsenSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated Learning (FL) has become an established technique to facilitate privacy-preserving collaborative training across a multitude of clients. However, new approaches to FL often discuss their contributions involving small deep-learning models only and focus on training full models on clients. In the wake of Foundation Models (FM), the reality is different for many deep learning applications. Typically, FMs have already been pre-trained across a wide variety of tasks and can be fine-tuned to specific downstream tasks over significantly smaller datasets than required for full model training. However, access to such datasets is often challenging. By its design, FL can help to open data silos. With this survey, we introduce a novel taxonomy focused on computational and communication efficiency, the vital elements to make use of FMs in FL systems. We discuss the benefits and drawbacks of parameter-efficient fine-tuning (PEFT) for FL applications, elaborate on the readiness of FL frameworks to work with FMs and provide future research opportunities on how to evaluate generative models in FL as well as the interplay of privacy and PEFT.
- [597] arXiv:2401.04474 (cross-list from cs.IR) [ pdf , other ]
-
Title: Combining Embedding-Based and Semantic-Based Models for Post-hoc Explanations in Recommender SystemsSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: In today's data-rich environment, recommender systems play a crucial role in decision support systems. They provide to users personalized recommendations and explanations about these recommendations. Embedding-based models, despite their widespread use, often suffer from a lack of interpretability, which can undermine trust and user engagement. This paper presents an approach that combines embedding-based and semantic-based models to generate post-hoc explanations in recommender systems, leveraging ontology-based knowledge graphs to improve interpretability and explainability. By organizing data within a structured framework, ontologies enable the modeling of intricate relationships between entities, which is essential for generating explanations. By combining embedding-based and semantic based models for post-hoc explanations in recommender systems, the framework we defined aims at producing meaningful and easy-to-understand explanations, enhancing user trust and satisfaction, and potentially promoting the adoption of recommender systems across the e-commerce sector.
- [598] arXiv:2401.04481 (cross-list from cs.CL) [ pdf , other ]
-
Title: Fighting Fire with Fire: Adversarial Prompting to Generate a Misinformation Detection DatasetSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The recent success in language generation capabilities of large language models (LLMs), such as GPT, Bard, Llama etc., can potentially lead to concerns about their possible misuse in inducing mass agitation and communal hatred via generating fake news and spreading misinformation. Traditional means of developing a misinformation ground-truth dataset does not scale well because of the extensive manual effort required to annotate the data. In this paper, we propose an LLM-based approach of creating silver-standard ground-truth datasets for identifying misinformation. Specifically speaking, given a trusted news article, our proposed approach involves prompting LLMs to automatically generate a summarised version of the original article. The prompts in our proposed approach act as a controlling mechanism to generate specific types of factual incorrectness in the generated summaries, e.g., incorrect quantities, false attributions etc. To investigate the usefulness of this dataset, we conduct a set of experiments where we train a range of supervised models for the task of misinformation detection.
- [599] arXiv:2401.04489 (cross-list from cs.LG) [ pdf , other ]
-
Title: Optimal Survival Trees: A Dynamic Programming ApproachComments: Published at AAAI-24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract: Survival analysis studies and predicts the time of death, or other singular unrepeated events, based on historical data, while the true time of death for some instances is unknown. Survival trees enable the discovery of complex nonlinear relations in a compact human comprehensible model, by recursively splitting the population and predicting a distinct survival distribution in each leaf node. We use dynamic programming to provide the first survival tree method with optimality guarantees, enabling the assessment of the optimality gap of heuristics. We improve the scalability of our method through a special algorithm for computing trees up to depth two. The experiments show that our method's run time even outperforms some heuristics for realistic cases while obtaining similar out-of-sample performance with the state-of-the-art.
- [600] arXiv:2401.04507 (cross-list from cs.CL) [ pdf , other ]
-
Title: TechGPT-2.0: A large language model project to solve the task of knowledge graph constructionAuthors: Jiaqi Wang , Yuying Chang , Zhong Li , Ning An , Qi Ma , Lei Hei , Haibo Luo , Yifei Lu , Feiliang RenSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models have exhibited robust performance across diverse natural language processing tasks. This report introduces TechGPT-2.0, a project designed to enhance the capabilities of large language models specifically in knowledge graph construction tasks, including named entity recognition (NER) and relationship triple extraction (RTE) tasks in NLP applications. Additionally, it serves as a LLM accessible for research within the Chinese open-source model community. We offer two 7B large language model weights and a QLoRA weight specialized for processing lengthy texts.Notably, TechGPT-2.0 is trained on Huawei's Ascend server. Inheriting all functionalities from TechGPT-1.0, it exhibits robust text processing capabilities, particularly in the domains of medicine and law. Furthermore, we introduce new capabilities to the model, enabling it to process texts in various domains such as geographical areas, transportation, organizations, literary works, biology, natural sciences, astronomical objects, and architecture. These enhancements also fortified the model's adeptness in handling hallucinations, unanswerable queries, and lengthy texts. This report provides a comprehensive and detailed introduction to the full fine-tuning process on Huawei's Ascend servers, encompassing experiences in Ascend server debugging, instruction fine-tuning data processing, and model training. Our code is available at this https URL
- [601] arXiv:2401.04515 (cross-list from cs.CL) [ pdf , other ]
-
Title: Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This article investigates a zero-shot approach to hypernymy prediction using large language models (LLMs). The study employs a method based on text probability calculation, applying it to various generated prompts. The experiments demonstrate a strong correlation between the effectiveness of language model prompts and classic patterns, indicating that preliminary prompt selection can be carried out using smaller models before moving to larger ones. We also explore prompts for predicting co-hyponyms and improving hypernymy predictions by augmenting prompts with additional information through automatically identified co-hyponyms. An iterative approach is developed for predicting higher-level concepts, which further improves the quality on the BLESS dataset (MAP = 0.8).
- [602] arXiv:2401.04518 (cross-list from cs.CL) [ pdf , other ]
-
Title: The Critique of CritiqueSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Critique, as a natural language description for assessing the quality of model-generated content, has been proven to play an essential role in the training, evaluation, and refinement of Large Language Models (LLMs). However, there is a lack of principled understanding in evaluating the quality of the critique itself. In this paper, we pioneer the critique of critique, termed MetaCritique, which is a framework to evaluate the critique from two aspects, i.e., factuality as precision score and comprehensiveness as recall score. We calculate the harmonic mean of precision and recall as the overall rating called F1 score. To obtain a reliable evaluation outcome, we propose Atomic Information Units (AIUs), which describe the critique in a more fine-grained manner. MetaCritique takes each AIU into account and aggregates each AIU's judgment for the overall score. Moreover, given the evaluation process involves intricate reasoning, our MetaCritique provides a natural language rationale to support each judgment. We construct a meta-evaluation dataset containing 300 critiques (2653 AIUs) across four tasks (question answering, reasoning, entailment, and summarization), and we conduct a comparative study to demonstrate the feasibility and effectiveness. Experiments also show superior critique judged by MetaCritique leads to better refinement, indicating generative artificial intelligence indeed has the potential to be significantly advanced with our MetaCritique. We will release relevant code and meta-evaluation datasets at this https URL .
- [603] arXiv:2401.04531 (cross-list from cs.CL) [ pdf , other ]
-
Title: MERA: A Comprehensive LLM Evaluation in RussianAuthors: Alena Fenogenova , Artem Chervyakov , Nikita Martynov , Anastasia Kozlova , Maria Tikhonova , Albina Akhmetgareeva , Anton Emelyanov , Denis Shevelev , Pavel Lebedev , Leonid Sinev , Ulyana Isaeva , Katerina Kolomeytseva , Daniil Moskovskiy , Elizaveta Goncharova , Nikita Savushkin , Polina Mikhailova , Denis Dimitrov , Alexander Panchenko , Sergei MarkovComments: The paper version comparable with the release code v.1.1.0 of the benchmark. The links and scores are updatedSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.
- [604] arXiv:2401.04536 (cross-list from cs.CL) [ pdf , other ]
-
Title: Evaluating Language Model Agency through NegotiationsAuthors: Tim R. Davidson , Veniamin Veselovsky , Martin Josifoski , Maxime Peyrard , Antoine Bosselut , Michal Kosinski , Robert WestComments: Accepted to ICLR 2024, code and link to project data are made available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce an approach to evaluate language model (LM) agency using negotiation games. This approach better reflects real-world use cases and addresses some of the shortcomings of alternative LM benchmarks. Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage. We use our approach to test six widely used and publicly accessible LMs, evaluating performance and alignment in both self-play and cross-play settings. Noteworthy findings include: (i) only closed-source models tested here were able to complete these tasks; (ii) cooperative bargaining games proved to be most challenging to the models; and (iii) even the most powerful models sometimes "lose" to weaker opponents
- [605] arXiv:2401.04575 (cross-list from cs.CV) [ pdf , other ]
-
Title: Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept UnderstandingAuthors: Yatong Bai , Utsav Garg , Apaar Shanker , Haoming Zhang , Samyak Parajuli , Erhan Bas , Isidora Filipovic , Amelia N. Chu , Eugenia D Fomitcheva , Elliot Branson , Aerin Kim , Somayeh Sojoudi , Kyunghyun ChoSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Vision and vision-language applications of neural networks, such as image classification and captioning, rely on large-scale annotated datasets that require non-trivial data-collecting processes. This time-consuming endeavor hinders the emergence of large-scale datasets, limiting researchers and practitioners to a small number of choices. Therefore, we seek more efficient ways to collect and annotate images. Previous initiatives have gathered captions from HTML alt-texts and crawled social media postings, but these data sources suffer from noise, sparsity, or subjectivity. For this reason, we turn to commercial shopping websites whose data meet three criteria: cleanliness, informativeness, and fluency. We introduce the Let's Go Shopping (LGS) dataset, a large-scale public dataset with 15 million image-caption pairs from publicly available e-commerce websites. When compared with existing general-domain datasets, the LGS images focus on the foreground object and have less complex backgrounds. Our experiments on LGS show that the classifiers trained on existing benchmark datasets do not readily generalize to e-commerce data, while specific self-supervised visual feature extractors can better generalize. Furthermore, LGS's high-quality e-commerce-focused images and bimodal nature make it advantageous for vision-language bi-modal tasks: LGS enables image-captioning models to generate richer captions and helps text-to-image generation models achieve e-commerce style transfer.
- [606] arXiv:2401.04577 (cross-list from cs.SD) [ pdf , other ]
-
Title: Masked Audio Generation using a Single Non-Autoregressive TransformerAuthors: Alon Ziv , Itai Gat , Gael Le Lan , Tal Remez , Felix Kreuk , Alexandre Défossez , Jade Copet , Gabriel Synnaeve , Yossi AdiSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens. Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer. During training, we predict spans of masked tokens obtained from a masking scheduler, while during inference we gradually construct the output sequence using several decoding steps. To further enhance the quality of the generated audio, we introduce a novel rescoring method in which, we leverage an external pre-trained model to rescore and rank predictions from MAGNeT, which will be then used for later decoding steps. Lastly, we explore a hybrid version of MAGNeT, in which we fuse between autoregressive and non-autoregressive models to generate the first few seconds in an autoregressive manner while the rest of the sequence is being decoded in parallel. We demonstrate the efficiency of MAGNeT for the task of text-to-music and text-to-audio generation and conduct an extensive empirical evaluation, considering both objective metrics and human studies. The proposed approach is comparable to the evaluated baselines, while being significantly faster (x7 faster than the autoregressive baseline). Through ablation studies and analysis, we shed light on the importance of each of the components comprising MAGNeT, together with pointing to the trade-offs between autoregressive and non-autoregressive modeling, considering latency, throughput, and generation quality. Samples are available on our demo page this https URL .
- [607] arXiv:2401.04620 (cross-list from cs.CL) [ pdf , other ]
-
Title: Agent Alignment in Evolving Social NormsComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Agents based on Large Language Models (LLMs) are increasingly permeating various domains of human production and life, highlighting the importance of aligning them with human values. The current alignment of AI systems primarily focuses on passively aligning LLMs through human intervention. However, agents possess characteristics like receiving environmental feedback and self-evolution, rendering the LLM alignment methods inadequate. In response, we propose an evolutionary framework for agent evolution and alignment, named EvolutionaryAgent, which transforms agent alignment into a process of evolution and selection under the principle of survival of the fittest. In an environment where social norms continuously evolve, agents better adapted to the current social norms will have a higher probability of survival and proliferation, while those inadequately aligned dwindle over time. Experimental results assessing the agents from multiple perspectives in aligning with social norms demonstrate that EvolutionaryAgent can align progressively better with the evolving social norms while maintaining its proficiency in general tasks. Effectiveness tests conducted on various open and closed-source LLMs as the foundation for agents also prove the applicability of our approach.
- [608] arXiv:2401.04621 (cross-list from cs.SE) [ pdf , other ]
-
Title: DebugBench: Evaluating Debugging Capability of Large Language ModelsAuthors: Runchu Tian , Yining Ye , Yujia Qin , Xin Cong , Yankai Lin , Yinxu Pan , Yesai Wu , Zhiyuan Liu , Maosong SunComments: in progressSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) have demonstrated exceptional coding capability. However, as another critical component of programming proficiency, the debugging capability of LLMs remains relatively unexplored. Previous evaluations of LLMs' debugging ability are significantly limited by the risk of data leakage, the scale of the dataset, and the variety of tested bugs. To overcome these deficiencies, we introduce `DebugBench', an LLM debugging benchmark consisting of 4,253 instances. It covers four major bug categories and 18 minor types in C++, Java, and Python. To construct DebugBench, we collect code snippets from the LeetCode community, implant bugs into source data with GPT-4, and assure rigorous quality checks. We evaluate two commercial and three open-source models in a zero-shot scenario. We find that (1) while closed-source models like GPT-4 exhibit inferior debugging performance compared to humans, open-source models such as Code Llama fail to attain any pass rate scores; (2) the complexity of debugging notably fluctuates depending on the bug category; (3) incorporating runtime feedback has a clear impact on debugging performance which is not always helpful. As an extension, we also compare LLM debugging and code generation, revealing a strong correlation between them for closed-source models. These findings will benefit the development of LLMs in debugging.
- [609] arXiv:2401.04637 (cross-list from cs.SE) [ pdf , other ]
-
Title: Applying Large Language Models API to Issue Classification ProblemComments: 4 pages, 1 figure, NLBSE and ICSE conference submission, ACM formatted, pre printSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Effective prioritization of issue reports is crucial in software engineering to optimize resource allocation and address critical problems promptly. However, the manual classification of issue reports for prioritization is laborious and lacks scalability. Alternatively, many open source software (OSS) projects employ automated processes for this task, albeit relying on substantial datasets for adequate training. This research seeks to devise an automated approach that ensures reliability in issue prioritization, even when trained on smaller datasets. Our proposed methodology harnesses the power of Generative Pre-trained Transformers (GPT), recognizing their potential to efficiently handle this task. By leveraging the capabilities of such models, we aim to develop a robust system for prioritizing issue reports accurately, mitigating the necessity for extensive training data while maintaining reliability. In our research, we have developed a reliable GPT-based approach to accurately label and prioritize issue reports with a reduced training dataset. By reducing reliance on massive data requirements and focusing on few-shot fine-tuning, our methodology offers a more accessible and efficient solution for issue prioritization in software engineering. Our model predicted issue types in individual projects up to 93.2% in precision, 95% in recall, and 89.3% in F1-score.
- [610] arXiv:2401.04647 (cross-list from cs.CV) [ pdf , other ]
-
Title: Advancing Ante-Hoc Explainable Models through Generative Adversarial NetworksComments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024 ( this https URL ). Paper accepted and presented at Deployable AI Workshop at AAAI-2024 ( this https URL )Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper presents a novel concept learning framework for enhancing model interpretability and performance in visual classification tasks. Our approach appends an unsupervised explanation generator to the primary classifier network and makes use of adversarial training. During training, the explanation module is optimized to extract visual concepts from the classifier's latent representations, while the GAN-based module aims to discriminate images generated from concepts, from true images. This joint training scheme enables the model to implicitly align its internally learned concepts with human-interpretable visual properties. Comprehensive experiments demonstrate the robustness of our approach, while producing coherent concept activations. We analyse the learned concepts, showing their semantic concordance with object parts and visual attributes. We also study how perturbations in the adversarial training protocol impact both classification and concept acquisition. In summary, this work presents a significant step towards building inherently interpretable deep vision models with task-aligned concept representations - a key enabler for developing trustworthy AI for real-world perception tasks.
- [611] arXiv:2401.04648 (cross-list from cs.LG) [ pdf , other ]
-
Title: A novel framework for generalization of deep hidden physics modelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Analysis of PDEs (math.AP)
Abstract: Modelling of systems where the full system information is unknown is an oft encountered problem for various engineering and industrial applications, as it's either impossible to consider all the complex physics involved or simpler models are considered to keep within the limits of the available resources. Recent advances in greybox modelling like the deep hidden physics models address this space by combining data and physics. However, for most real-life applications, model generalizability is a key issue, as retraining a model for every small change in system inputs and parameters or modification in domain configuration can render the model economically unviable. In this work we present a novel enhancement to the idea of hidden physics models which can generalize for changes in system inputs, parameters and domains. We also show that this approach holds promise in system discovery as well and helps learn the hidden physics for the changed system inputs, parameters and domain configuration.
- [612] arXiv:2401.04658 (cross-list from cs.CL) [ pdf , other ]
-
Title: Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language ModelsComments: Technical Report. Yiran Zhong is the corresponding author. The source code is available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Linear attention is an efficient attention mechanism that has recently emerged as a promising alternative to conventional softmax attention. With its ability to process tokens in linear computational complexities, linear attention, in theory, can handle sequences of unlimited length without sacrificing speed, i.e., maintaining a constant training speed for various sequence lengths with a fixed memory consumption. However, due to the issue with cumulative summation (cumsum), current linear attention algorithms cannot demonstrate their theoretical advantage in a causal setting. In this paper, we present Lightning Attention-2, the first linear attention implementation that enables linear attention to realize its theoretical computational benefits. To achieve this, we leverage the thought of tiling, separately handling the intra-block and inter-block components in linear attention calculation. Specifically, we utilize the conventional attention computation mechanism for the intra-blocks and apply linear attention kernel tricks for the inter-blocks. A tiling technique is adopted through both forward and backward procedures to take full advantage of the GPU hardware. We implement our algorithm in Triton to make it IO-aware and hardware-friendly. Various experiments are conducted on different model sizes and sequence lengths. Lightning Attention-2 retains consistent training and inference speed regardless of input sequence length and is significantly faster than other attention mechanisms. The source code is available at this https URL .
- [613] arXiv:2401.04666 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs DatasetSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As the most basic application and implementation of deep learning, image classification has grown in popularity. Various datasets are provided by renowned data science communities for benchmarking machine learning algorithms and pre-trained models. The ASSIRA Cats & Dogs dataset is one of them and is being used in this research for its overall acceptance and benchmark standards. A comparison of various pre-trained models is demonstrated by using different types of optimizers and loss functions. Hyper-parameters are changed to gain the best result from a model. By applying this approach, we have got higher accuracy without major changes in the training model. To run the experiment, we used three different computer architectures: a laptop equipped with NVIDIA GeForce GTX 1070, a laptop equipped with NVIDIA GeForce RTX 3080Ti, and a desktop equipped with NVIDIA GeForce RTX 3090. The acquired results demonstrate supremacy in terms of accuracy over the previously done experiments on this dataset. From this experiment, the highest accuracy which is 99.65% is gained using the NASNet Large.
- [614] arXiv:2401.04679 (cross-list from cs.CL) [ pdf , other ]
-
Title: RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We investigate parameter-efficient fine-tuning (PEFT) methods that can provide good accuracy under limited computational and memory budgets in the context of large language models (LLMs). We present a new PEFT method called Robust Adaptation (RoSA) inspired by robust principal component analysis that jointly trains $\textit{low-rank}$ and $\textit{highly-sparse}$ components on top of a set of fixed pretrained weights to efficiently approximate the performance of a full-fine-tuning (FFT) solution. Across a series of challenging generative tasks such as grade-school math and SQL query generation, which require fine-tuning for good performance, we show that RoSA outperforms LoRA, pure sparse fine-tuning, and alternative hybrid methods at the same parameter budget, and can even recover the performance of FFT on some tasks. We provide system support for RoSA to complement the training algorithm, specifically in the form of sparse GPU kernels which enable memory- and computationally-efficient training, and show that it is also compatible with low-precision base weights, resulting in the first joint representation combining quantization, low-rank and sparse approximations. Our code is accessible at this https URL .
- [615] arXiv:2401.04728 (cross-list from cs.CV) [ pdf , other ]
-
Title: Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar CreationComments: [CVPR 2024] Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in generative diffusion models have enabled the previously unfeasible capability of generating 3D assets from a single input image or a text prompt. In this work, we aim to enhance the quality and functionality of these models for the task of creating controllable, photorealistic human avatars. We achieve this by integrating a 3D morphable model into the state-of-the-art multi-view-consistent diffusion approach. We demonstrate that accurate conditioning of a generative pipeline on the articulated 3D model enhances the baseline model performance on the task of novel view synthesis from a single image. More importantly, this integration facilitates a seamless and accurate incorporation of facial expression and body pose control into the generation process. To the best of our knowledge, our proposed framework is the first diffusion model to enable the creation of fully 3D-consistent, animatable, and photorealistic human avatars from a single image of an unseen subject; extensive quantitative and qualitative evaluations demonstrate the advantages of our approach over existing state-of-the-art avatar creation models on both novel view and novel expression synthesis tasks. The code for our project is publicly available.
- [616] arXiv:2401.04732 (cross-list from cs.IR) [ pdf , other ]
-
Title: A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendationAuthors: Manpreet Singh , Ravdeep Pasricha , Nitish Singh , Ravi Prasad Kondapalli , Manoj R , Kiran R , Laurent BouéJournal-ref: Microsoft Journal of Applied Research, Volume 20, 2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this paper, we design a real-time question-answering system specifically targeted for helping sellers get relevant material/documentation they can share live with their customers or refer to during a call. Taking the Seismic content repository as a relatively large scale example of a diverse dataset of sales material, we demonstrate how LLM embeddings of sellers' queries can be matched with the relevant content. We achieve this by engineering prompts in an elaborate fashion that makes use of the rich set of meta-features available for documents and sellers. Using a bi-encoder with cross-encoder re-ranker architecture, we show how the solution returns the most relevant content recommendations in just a few seconds even for large datasets. Our recommender system is deployed as an AML endpoint for real-time inferencing and has been integrated into a Copilot interface that is now deployed in the production version of the Dynamics CRM, known as MSX, used daily by Microsoft sellers.
- [617] arXiv:2401.04736 (cross-list from eess.SY) [ pdf , ps , other ]
-
Title: Exploring Attack Resilience in Distributed Platoon Controllers with Model Predictive ControlAuthors: Tashfique Hasnine ChoudhuryComments: ThesisSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI)
Abstract: The extensive use of distributed vehicle platoon controllers has resulted in several benefits for transportation systems, such as increased traffic flow, fuel efficiency, and decreased pollution. The rising reliance on interconnected systems and communication networks, on the other hand, exposes these controllers to potential cyber-attacks, which may compromise their safety and functionality. This thesis aims to improve the security of distributed vehicle platoon controllers by investigating attack scenarios and assessing their influence on system performance. Various attack techniques, including man-in-the-middle (MITM) and false data injection (FDI), are simulated using Model Predictive Control (MPC) controller to identify vulnerabilities and weaknesses of the platoon controller. Countermeasures are offered and tested, that includes attack analysis and reinforced communication protocols using Machine Learning techniques for detection. The findings emphasize the significance of integrating security issues into their design and implementation, which helps to construct safe and resilient distributed platoon controllers.
- [618] arXiv:2401.04737 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: Music Genre Classification: A Comparative Analysis of CNN and XGBoost Approaches with Mel-frequency cepstral coefficients and Mel SpectrogramsAuthors: Yigang MengSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: In recent years, various well-designed algorithms have empowered music platforms to provide content based on one's preferences. Music genres are defined through various aspects, including acoustic features and cultural considerations. Music genre classification works well with content-based filtering, which recommends content based on music similarity to users. Given a considerable dataset, one premise is automatic annotation using machine learning or deep learning methods that can effectively classify audio files. The effectiveness of systems largely depends on feature and model selection, as different architectures and features can facilitate each other and yield different results. In this study, we conduct a comparative study investigating the performances of three models: a proposed convolutional neural network (CNN), the VGG16 with fully connected layers (FC), and an eXtreme Gradient Boosting (XGBoost) approach on different features: 30-second Mel spectrogram and 3-second Mel-frequency cepstral coefficients (MFCCs). The results show that the MFCC XGBoost model outperformed the others. Furthermore, applying data segmentation in the data preprocessing phase can significantly enhance the performance of the CNNs.
- [619] arXiv:2401.04739 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Content-Conditioned Generation of Stylized Free hand SketchesComments: 6 pages, 7 figures, ICSMDSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, the recognition of free-hand sketches has remained a popular task. However, in some special fields such as the military field, free-hand sketches are difficult to sample on a large scale. Common data augmentation and image generation techniques are difficult to produce images with various free-hand sketching styles. Therefore, the recognition and segmentation tasks in related fields are limited. In this paper, we propose a novel adversarial generative network that can accurately generate realistic free-hand sketches with various styles. We explore the performance of the model, including using styles randomly sampled from a prior normal distribution to generate images with various free-hand sketching styles, disentangling the painters' styles from known free-hand sketches to generate images with specific styles, and generating images of unknown classes that are not in the training set. We further demonstrate with qualitative and quantitative evaluations our advantages in visual quality, content accuracy, and style imitation on SketchIME.
- [620] arXiv:2401.04747 (cross-list from cs.SD) [ pdf , other ]
-
Title: DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture GenerationComments: Accepted by CVPR 2024. Project page: this https URLSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
Abstract: We propose DiffSHEG, a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length. While previous works focused on co-speech gesture or expression generation individually, the joint generation of synchronized expressions and gestures remains barely explored. To address this, our diffusion-based co-speech motion generation transformer enables uni-directional information flow from expression to gesture, facilitating improved matching of joint expression-gesture distributions. Furthermore, we introduce an outpainting-based sampling strategy for arbitrary long sequence generation in diffusion models, offering flexibility and computational efficiency. Our method provides a practical solution that produces high-quality synchronized expression and gesture generation driven by speech. Evaluated on two public datasets, our approach achieves state-of-the-art performance both quantitatively and qualitatively. Additionally, a user study confirms the superiority of DiffSHEG over prior approaches. By enabling the real-time generation of expressive and synchronized motions, DiffSHEG showcases its potential for various applications in the development of digital humans and embodied agents.
- [621] arXiv:2401.04748 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Convolutional Neural Network Ensemble Learning for Hyperspectral Imaging-based Blackberry Fruit Ripeness Detection in Uncontrolled Farm EnvironmentAuthors: Chollette C. Olisah , Ben Trewhella , Bo Li , Melvyn L. Smith , Benjamin Winstone , E. Charles Whitfield , Felicidad Fernández Fernández , Harriet DuncalfeComments: 25 pages, 10 figures, 6 tables; submited to EAAISubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Fruit ripeness estimation models have for decades depended on spectral index features or colour-based features, such as mean, standard deviation, skewness, colour moments, and/or histograms for learning traits of fruit ripeness. Recently, few studies have explored the use of deep learning techniques to extract features from images of fruits with visible ripeness cues. However, the blackberry (Rubus fruticosus) fruit does not show obvious and reliable visible traits of ripeness when mature and therefore poses great difficulty to fruit pickers. The mature blackberry, to the human eye, is black before, during, and post-ripening. To address this engineering application challenge, this paper proposes a novel multi-input convolutional neural network (CNN) ensemble classifier for detecting subtle traits of ripeness in blackberry fruits. The multi-input CNN was created from a pre-trained visual geometry group 16-layer deep convolutional network (VGG16) model trained on the ImageNet dataset. The fully connected layers were optimized for learning traits of ripeness of mature blackberry fruits. The resulting model served as the base for building homogeneous ensemble learners that were ensemble using the stack generalization ensemble (SGE) framework. The input to the network is images acquired with a stereo sensor using visible and near-infrared (VIS-NIR) spectral filters at wavelengths of 700 nm and 770 nm. Through experiments, the proposed model achieved 95.1% accuracy on unseen sets and 90.2% accuracy with in-field conditions. Further experiments reveal that machine sensory is highly and positively correlated to human sensory over blackberry fruit skin texture.
- [622] arXiv:2401.04749 (cross-list from cs.LG) [ pdf , other ]
-
Title: LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly DetectionAuthors: Hongcheng Guo , Jian Yang , Jiaheng Liu , Jiaqi Bai , Boyang Wang , Zhoujun Li , Tieqiao Zheng , Bo Zhang , Junran peng , Qi TianComments: arXiv admin note: text overlap with arXiv:2201.00016Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: Log anomaly detection is a key component in the field of artificial intelligence for IT operations (AIOps). Considering log data of variant domains, retraining the whole network for unknown domains is inefficient in real industrial scenarios. However, previous deep models merely focused on extracting the semantics of log sequences in the same domain, leading to poor generalization on multi-domain logs. To alleviate this issue, we propose a unified Transformer-based framework for Log anomaly detection (LogFormer) to improve the generalization ability across different domains, where we establish a two-stage process including the pre-training and adapter-based tuning stage. Specifically, our model is first pre-trained on the source domain to obtain shared semantic knowledge of log data. Then, we transfer such knowledge to the target domain via shared parameters. Besides, the Log-Attention module is proposed to supplement the information ignored by the log-paring. The proposed method is evaluated on three public and one real-world datasets. Experimental results on multiple benchmarks demonstrate the effectiveness of our LogFormer with fewer trainable parameters and lower training costs.
- [623] arXiv:2401.04757 (cross-list from cs.LG) [ pdf , other ]
-
Title: How predictable is language model benchmark performance?Authors: David OwenSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We investigate large language model performance across five orders of magnitude of compute scaling in eleven recent model architectures. We show that average benchmark performance, aggregating over many individual tasks and evaluations as in the commonly-used BIG-Bench dataset, is decently predictable as a function of training compute scale. Specifically, when extrapolating BIG-Bench Hard performance across one order of magnitude in compute, we observe average absolute errors of 6 percentage points (pp). By contrast, extrapolation for individual BIG-Bench tasks across an order of magnitude in compute yields higher average errors of 18pp. Nonetheless, individual task performance remains significantly more predictable than chance. Overall, our work suggests compute scaling provides a promising basis to forecast AI capabilities in diverse benchmarks, though predicting performance in specific tasks poses challenges.
- [624] arXiv:2401.04820 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Phishing Website Detection through Multi-Model Analysis of HTML ContentSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The way we communicate and work has changed significantly with the rise of the Internet. While it has opened up new opportunities, it has also brought about an increase in cyber threats. One common and serious threat is phishing, where cybercriminals employ deceptive methods to steal sensitive information.This study addresses the pressing issue of phishing by introducing an advanced detection model that meticulously focuses on HTML content. Our proposed approach integrates a specialized Multi-Layer Perceptron (MLP) model for structured tabular data and two pretrained Natural Language Processing (NLP) models for analyzing textual features such as page titles and content. The embeddings from these models are harmoniously combined through a novel fusion process. The resulting fused embeddings are then input into a linear classifier. Recognizing the scarcity of recent datasets for comprehensive phishing research, our contribution extends to the creation of an up-to-date dataset, which we openly share with the community. The dataset is meticulously curated to reflect real-life phishing conditions, ensuring relevance and applicability. The research findings highlight the effectiveness of the proposed approach, with the CANINE demonstrating superior performance in analyzing page titles and the RoBERTa excelling in evaluating page content. The fusion of two NLP and one MLP model,termed MultiText-LP, achieves impressive results, yielding a 96.80 F1 score and a 97.18 accuracy score on our research dataset. Furthermore, our approach outperforms existing methods on the CatchPhish HTML dataset, showcasing its efficacies.
- [625] arXiv:2401.04821 (cross-list from cs.CL) [ pdf , other ]
-
Title: MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot TransferSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Transformer-based pre-trained language models (PLMs) have achieved remarkable performance in various natural language processing (NLP) tasks. However, pre-training such models can take considerable resources that are almost only available to high-resource languages. On the contrary, static word embeddings are easier to train in terms of computing resources and the amount of data required. In this paper, we introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer), a novel and challenging task that is especially relevant to low-resource languages for which static word embeddings are available. To tackle the task, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. In this way, we can train the PLM on source-language training data and perform zero-shot transfer to the target language by simply swapping the embedding layer. However, through extensive experiments on two classification datasets, we show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines. In this paper, we attempt to explain this negative result and provide several thoughts on possible improvement.
- [626] arXiv:2401.04851 (cross-list from cs.MA) [ pdf , other ]
-
Title: Graph Learning-based Fleet Scheduling for Urban Air Mobility under Operational Constraints, Varying Demand & UncertaintiesComments: This paper is accepted to be presented at the ACM Symposium on Applied Computing 2024Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper develops a graph reinforcement learning approach to online planning of the schedule and destinations of electric aircraft that comprise an urban air mobility (UAM) fleet operating across multiple vertiports. This fleet scheduling problem is formulated to consider time-varying demand, constraints related to vertiport capacity, aircraft capacity and airspace safety guidelines, uncertainties related to take-off delay, weather-induced route closures, and unanticipated aircraft downtime. Collectively, such a formulation presents greater complexity, and potentially increased realism, than in existing UAM fleet planning implementations. To address these complexities, a new policy architecture is constructed, primary components of which include: graph capsule conv-nets for encoding vertiport and aircraft-fleet states both abstracted as graphs; transformer layers encoding time series information on demand and passenger fare; and a Multi-head Attention-based decoder that uses the encoded information to compute the probability of selecting each available destination for an aircraft. Trained with Proximal Policy Optimization, this policy architecture shows significantly better performance in terms of daily averaged profits on unseen test scenarios involving 8 vertiports and 40 aircraft, when compared to a random baseline and genetic algorithm-derived optimal solutions, while being nearly 1000 times faster in execution than the latter.
- [627] arXiv:2401.04858 (cross-list from cs.CL) [ pdf , other ]
-
Title: User Embedding Model for Personalized Language PromptingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Modeling long histories plays a pivotal role in enhancing recommendation systems, allowing to capture user's evolving preferences, resulting in more precise and personalized recommendations. In this study we tackle the challenges of modeling long user histories for preference understanding in natural language. Specifically, we introduce a new User Embedding Module (UEM) that efficiently processes user history in free-form text by compressing and representing them as embeddings, to use them as soft prompts to a LM. Our experiments demonstrate the superior capability of this approach in handling significantly longer histories compared to conventional text based prompting methods, yielding substantial improvements in predictive performance. The main contribution of this research is to demonstrate the ability to bias language models with user signals represented as embeddings.
- [628] arXiv:2401.04867 (cross-list from cs.CL) [ pdf , other ]
-
Title: An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue SystemsComments: This paper has been accepted for presentation at International Workshop on Spoken Dialogue Systems Technology 2024 (IWSDS 2024) and represents the author's version of the workSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Establishing evaluation schemes for spoken dialogue systems is important, but it can also be challenging. While subjective evaluations are commonly used in user experiments, objective evaluations are necessary for research comparison and reproducibility. To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors. In this paper, to this end, we investigate the relationship between user behaviors and subjective evaluation scores in social dialogue tasks: attentive listening, job interview, and first-meeting conversation. The results reveal that in dialogue tasks where user utterances are primary, such as attentive listening and job interview, indicators like the number of utterances and words play a significant role in evaluation. Observing disfluency also can indicate the effectiveness of formal tasks, such as job interview. On the other hand, in dialogue tasks with high interactivity, such as first-meeting conversation, behaviors related to turn-taking, like average switch pause length, become more important. These findings suggest that selecting appropriate user behaviors can provide valuable insights for objective evaluation in each social dialogue task.
- [629] arXiv:2401.04898 (cross-list from cs.CL) [ pdf , other ]
-
Title: ANGO: A Next-Level Evaluation Benchmark For Generation-Oriented Language Models In Chinese DomainAuthors: Bingchao WangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recently, various Large Language Models (LLMs) evaluation datasets have emerged, but most of them have issues with distorted rankings and difficulty in model capabilities analysis. Addressing these concerns, this paper introduces ANGO, a Chinese multi-choice question evaluation benchmark. ANGO proposes Keypoint categorization standard for the first time, each question in ANGO can correspond to multiple keypoints, effectively enhancing interpretability of evaluation results. Base on performance of real humans, we build a quantifiable question difficulty standard and divide ANGO questions into 9 difficulty levels, which provide more precise guidance for model training. To minimize data leakage impact and fully leverage ANGO's innovative features, we have engineered exclusive sampling strategies and a new evaluation framework that support swift testset iteration. Our experiments demonstrate that ANGO poses a stronger challenge to models and reveals more details in evaluation result compared to existing benchmarks.
- [630] arXiv:2401.04925 (cross-list from cs.CL) [ pdf , other ]
-
Title: The Impact of Reasoning Step Length on Large Language ModelsAuthors: Mingyu Jin , Qinkai Yu , Dong Shu , Haiyan Zhao , Wenyue Hua , Yanda Meng , Yongfeng Zhang , Mengnan DuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correlation between the effectiveness of CoT and the length of reasoning steps in prompts remains largely unknown. To shed light on this, we have conducted several empirical experiments to explore the relations. Specifically, we design experiments that expand and compress the rationale reasoning steps within CoT demonstrations, while keeping all other factors constant. We have the following key findings. First, the results indicate that lengthening the reasoning steps in prompts, even without adding new information into the prompt, considerably enhances LLMs' reasoning abilities across multiple datasets. Alternatively, shortening the reasoning steps, even while preserving the key information, significantly diminishes the reasoning abilities of models. This finding highlights the importance of the number of steps in CoT prompts and provides practical guidance to make better use of LLMs' potential in complex problem-solving scenarios. Second, we also investigated the relationship between the performance of CoT and the rationales used in demonstrations. Surprisingly, the result shows that even incorrect rationales can yield favorable outcomes if they maintain the requisite length of inference. Third, we observed that the advantages of increasing reasoning steps are task-dependent: simpler tasks require fewer steps, whereas complex tasks gain significantly from longer inference sequences.
- [631] arXiv:2401.04929 (cross-list from cs.CR) [ pdf , other ]
-
Title: Learning-Based Difficulty Calibration for Enhanced Membership Inference AttacksComments: 10 pages, 14 figuresSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Machine learning models, in particular deep neural networks, are currently an integral part of various applications, from healthcare to finance. However, using sensitive data to train these models raises concerns about privacy and security. One method that has emerged to verify if the trained models are privacy-preserving is Membership Inference Attacks (MIA), which allows adversaries to determine whether a specific data point was part of a model's training dataset. While a series of MIAs have been proposed in the literature, only a few can achieve high True Positive Rates (TPR) in the low False Positive Rate (FPR) region (0.01%~1%). This is a crucial factor to consider for an MIA to be practically useful in real-world settings. In this paper, we present a novel approach to MIA that is aimed at significantly improving TPR at low FPRs. Our method, named learning-based difficulty calibration for MIA(LDC-MIA), characterizes data records by their hardness levels using a neural network classifier to determine membership. The experiment results show that LDC-MIA can improve TPR at low FPR by up to 4x compared to the other difficulty calibration based MIAs. It also has the highest Area Under ROC curve (AUC) across all datasets. Our method's cost is comparable with most of the existing MIAs, but is orders of magnitude more efficient than one of the state-of-the-art methods, LiRA, while achieving similar performance.
- [632] arXiv:2401.04934 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A SurveyComments: The first two authors contribute equally with an alphabetic orderSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks, but restrictions of real-world applications may require training the agents in a fully decentralized manner. Due to the lack of information about other agents, it is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting. Thus, this research area has not been thoroughly studied. In this paper, we seek to systematically review the fully decentralized methods in two settings: maximizing a shared reward of all agents and maximizing the sum of individual rewards of all agents, and discuss open questions and future research directions.
- [633] arXiv:2401.04978 (cross-list from cs.LG) [ pdf , other ]
-
Title: Closed-Form Interpretation of Neural Network Classifiers with Symbolic Regression GradientsAuthors: Sebastian Johann WetzelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: I introduce a unified framework for interpreting neural network classifiers tailored toward automated scientific discovery. In contrast to neural network-based regression, for classification, it is in general impossible to find a one-to-one mapping from the neural network to a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. In this paper, I embed a trained neural network into an equivalence class of classifying functions that base their decisions on the same quantity. I interpret neural networks by finding an intersection between this equivalence class and human-readable equations defined by the search space of symbolic regression. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces or to simplify the process of interpreting neural network regressors.
- [634] arXiv:2401.04979 (cross-list from cs.LG) [ pdf , other ]
-
Title: Invertible Solution of Neural Differential Equations for Analysis of Irregularly-Sampled Time SeriesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: To handle the complexities of irregular and incomplete time series data, we propose an invertible solution of Neural Differential Equations (NDE)-based method. While NDE-based methods are a powerful method for analyzing irregularly-sampled time series, they typically do not guarantee reversible transformations in their standard form. Our method suggests the variation of Neural Controlled Differential Equations (Neural CDEs) with Neural Flow, which ensures invertibility while maintaining a lower computational burden. Additionally, it enables the training of a dual latent space, enhancing the modeling of dynamic temporal dynamics. Our research presents an advanced framework that excels in both classification and interpolation tasks. At the core of our approach is an enhanced dual latent states architecture, carefully designed for high precision across various time series tasks. Empirical analysis demonstrates that our method significantly outperforms existing models. This work significantly advances irregular time series analysis, introducing innovative techniques and offering a versatile tool for diverse practical applications.
- [635] arXiv:2401.04980 (cross-list from cs.RO) [ pdf , other ]
-
Title: Autonomous Navigation of Tractor-Trailer Vehicles through Roundabout IntersectionsJournal-ref: TACTFUL 2023Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, significant advancements have been made in the field of autonomous driving with the aim of increasing safety and efficiency. However, research that focuses on tractor-trailer vehicles is relatively sparse. Due to the physical characteristics and articulated joints, such vehicles require tailored models. While turning, the back wheels of the trailer turn at a tighter radius and the truck often has to deviate from the centre of the lane to accommodate this. Due to the lack of publicly available models, this work develops truck and trailer models using the high-fidelity simulation software CARLA, together with several roundabout scenarios, to establish a baseline dataset for benchmarks. Using a twin-q soft actor-critic algorithm, we train a quasi-end-to-end autonomous driving model which is able to achieve a 73% success rate on different roundabouts.
- [636] arXiv:2401.04993 (cross-list from cs.LG) [ pdf , other ]
-
Title: AdaFed: Fair Federated Learning via Adaptive Common Descent DirectionComments: This paper has been accepted in Transactions on Machine Learning Research. This is the link to the paper: this https URL &referrer=%5Bthe%20profile%20of%20Shayan%20Mohajer%20Hamidi%5D(%2Fprofile%3Fid%3D~Shayan_Mohajer_Hamidi1)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Federated learning (FL) is a promising technology via which some edge devices/clients collaboratively train a machine learning model orchestrated by a server. Learning an unfair model is known as a critical problem in federated learning, where the trained model may unfairly advantage or disadvantage some of the devices. To tackle this problem, in this work, we propose AdaFed. The goal of AdaFed is to find an updating direction for the server along which (i) all the clients' loss functions are decreasing; and (ii) more importantly, the loss functions for the clients with larger values decrease with a higher rate. AdaFed adaptively tunes this common direction based on the values of local gradients and loss functions. We validate the effectiveness of AdaFed on a suite of federated datasets, and demonstrate that AdaFed outperforms state-of-the-art fair FL methods.
- [637] arXiv:2401.05010 (cross-list from cs.CV) [ pdf , other ]
-
Title: Less is More: A Closer Look at Semantic-based Few-Shot LearningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Few-shot Learning aims to learn and distinguish new categories with a very limited number of available images, presenting a significant challenge in the realm of deep learning. Recent researchers have sought to leverage the additional textual or linguistic information of these rare categories with a pre-trained language model to facilitate learning, thus partially alleviating the problem of insufficient supervision signals. However, the full potential of the textual information and pre-trained language model have been underestimated in the few-shot learning till now, resulting in limited performance enhancements. To address this, we propose a simple but effective framework for few-shot learning tasks, specifically designed to exploit the textual information and language model. In more detail, we explicitly exploit the zero-shot capability of the pre-trained language model with the learnable prompt. And we just add the visual feature with the textual feature for inference directly without the intricate designed fusion modules in previous works. Additionally, we apply the self-ensemble and distillation to further enhance these components. Our extensive experiments conducted across four widely used few-shot datasets demonstrate that our simple framework achieves impressive results. Particularly noteworthy is its outstanding performance in the 1-shot learning task, surpassing state-of-the-art methods by an average of 3.0\% in classification accuracy. \footnote{We will make the source codes of the proposed framework publicly available upon acceptance. }.
- [638] arXiv:2401.05014 (cross-list from cs.CV) [ pdf , other ]
-
Title: Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant DataSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Source-free cross-modal knowledge transfer is a crucial yet challenging task, which aims to transfer knowledge from one source modality (e.g., RGB) to the target modality (e.g., depth or infrared) with no access to the task-relevant (TR) source data due to memory and privacy concerns. A recent attempt leverages the paired task-irrelevant (TI) data and directly matches the features from them to eliminate the modality gap. However, it ignores a pivotal clue that the paired TI data could be utilized to effectively estimate the source data distribution and better facilitate knowledge transfer to the target modality. To this end, we propose a novel yet concise framework to unlock the potential of paired TI data for enhancing source-free cross-modal knowledge transfer. Our work is buttressed by two key technical components. Firstly, to better estimate the source data distribution, we introduce a Task-irrelevant data-Guided Modality Bridging (TGMB) module. It translates the target modality data (e.g., infrared) into the source-like RGB images based on paired TI data and the guidance of the available source model to alleviate two key gaps: 1) inter-modality gap between the paired TI data; 2) intra-modality gap between TI and TR target data. We then propose a Task-irrelevant data-Guided Knowledge Transfer (TGKT) module that transfers knowledge from the source model to the target model by leveraging the paired TI data. Notably, due to the unavailability of labels for the TR target data and its less reliable prediction from the source model, our TGKT model incorporates a self-supervised pseudo-labeling approach to enable the target model to learn from its predictions. Extensive experiments show that our method achieves state-of-the-art performance on three datasets (RGB-to-depth and RGB-to-infrared).
- [639] arXiv:2401.05033 (cross-list from cs.CL) [ pdf , other ]
-
Title: Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-TalkSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) are powerful dialogue agents, but specializing them towards fulfilling a specific function can be challenging. Instructing tuning, i.e. tuning models on instruction and sample responses generated by humans (Ouyang et al., 2022), has proven as an effective method to do so, yet requires a number of data samples that a) might not be available or b) costly to generate. Furthermore, this cost increases when the goal is to make the LLM follow a specific workflow within a dialogue instead of single instructions. Inspired by the self-play technique in reinforcement learning and the use of LLMs to simulate human agents, we propose a more effective method for data collection through LLMs engaging in a conversation in various roles. This approach generates a training data via "self-talk" of LLMs that can be refined and utilized for supervised fine-tuning. We introduce an automated way to measure the (partial) success of a dialogue. This metric is used to filter the generated conversational data that is fed back in LLM for training. Based on our automated and human evaluations of conversation quality, we demonstrate that such self-talk data improves results. In addition, we examine the various characteristics that showcase the quality of generated dialogues and how they can be connected to their potential utility as training data.
- [640] arXiv:2401.05043 (cross-list from cs.LG) [ pdf , other ]
-
Title: CreINNs: Credal-Set Interval Neural Networks for Uncertainty Estimation in Classification TasksAuthors: Kaizheng Wang , Keivan Shariatmadar , Shireen Kudukkil Manchingal , Fabio Cuzzolin , David Moens , Hans HallezSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Uncertainty estimation is increasingly attractive for improving the reliability of neural networks. In this work, we present novel credal-set interval neural networks (CreINNs) designed for classification tasks. CreINNs preserve the traditional interval neural network structure, capturing weight uncertainty through deterministic intervals, while forecasting credal sets using the mathematical framework of probability intervals. Experimental validations on an out-of-distribution detection benchmark (CIFAR10 vs SVHN) showcase that CreINNs outperform epistemic uncertainty estimation when compared to variational Bayesian neural networks (BNNs) and deep ensembles (DEs). Furthermore, CreINNs exhibit a notable reduction in computational complexity compared to variational BNNs and demonstrate smaller model sizes than DEs.
- [641] arXiv:2401.05054 (cross-list from cs.CL) [ pdf , other ]
-
Title: Generating Diverse and High-Quality Texts by Minimum Bayes Risk DecodingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: One of the most important challenges in text generation systems is to produce outputs that are not only correct but also diverse. Recently, Minimum Bayes-Risk (MBR) decoding has gained prominence for generating sentences of the highest quality among the decoding algorithms. However, existing algorithms proposed for generating diverse outputs are predominantly based on beam search or random sampling, thus their output quality is capped by these underlying methods. In this paper, we investigate an alternative approach -- we develop diversity-promoting decoding algorithms by enforcing diversity objectives to MBR decoding. We propose two variants of MBR, Diverse MBR (DMBR) and $k$-medoids MBR (KMBR), methods to generate a set of sentences with high quality and diversity. We evaluate DMBR and KMBR on a variety of directed text generation tasks using encoder-decoder models and a large language model with prompting. The experimental results show that the proposed method achieves a better trade-off than the diverse beam search and sampling algorithms.
- [642] arXiv:2401.05097 (cross-list from cs.LG) [ pdf , other ]
-
Title: Any-Way Meta LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Although meta-learning seems promising performance in the realm of rapid adaptability, it is constrained by fixed cardinality. When faced with tasks of varying cardinalities that were unseen during training, the model lacks its ability. In this paper, we address and resolve this challenge by harnessing `label equivalence' emerged from stochastic numeric label assignments during episodic task sampling. Questioning what defines ``true" meta-learning, we introduce the ``any-way" learning paradigm, an innovative model training approach that liberates model from fixed cardinality constraints. Surprisingly, this model not only matches but often outperforms traditional fixed-way models in terms of performance, convergence speed, and stability. This disrupts established notions about domain generalization. Furthermore, we argue that the inherent label equivalence naturally lacks semantic information. To bridge this semantic information gap arising from label equivalence, we further propose a mechanism for infusing semantic class information into the model. This would enhance the model's comprehension and functionality. Experiments conducted on renowned architectures like MAML and ProtoNet affirm the effectiveness of our method.
- [643] arXiv:2401.05115 (cross-list from cs.HC) [ pdf , other ]
-
Title: Unpacking Human-AI interactions: From interaction primitives to a design spaceSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: This paper aims to develop a semi-formal design space for Human-AI interactions, by building a set of interaction primitives which specify the communication between users and AI systems during their interaction. We show how these primitives can be combined into a set of interaction patterns which can provide an abstract specification for exchanging messages between humans and AI/ML models to carry out purposeful interactions. The motivation behind this is twofold: firstly, to provide a compact generalisation of existing practices, that highlights the similarities and differences between systems in terms of their interaction behaviours; and secondly, to support the creation of new systems, in particular by opening the space of possibilities for interactions with models. We present a short literature review on frameworks, guidelines and taxonomies related to the design and implementation of HAI interactions, including human-in-the-loop, explainable AI, as well as hybrid intelligence and collaborative learning approaches. From the literature review, we define a vocabulary for describing information exchanges in terms of providing and requesting particular model-specific data types. Based on this vocabulary, a message passing model for interactions between humans and models is presented, which we demonstrate can account for existing systems and approaches. Finally, we build this into design patterns as mid-level constructs that capture common interactional structures. We discuss how this approach can be used towards a design space for Human-AI interactions that creates new possibilities for designs as well as keeping track of implementation issues and concerns.
- [644] arXiv:2401.05159 (cross-list from cs.CV) [ pdf , other ]
-
Title: Derm-T2IM: Harnessing Synthetic Skin Lesion Data via Stable Diffusion Models for Enhanced Skin Disease Classification using ViT and CNNComments: Paper is submitted in EMBC 2024 ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This study explores the utilization of Dermatoscopic synthetic data generated through stable diffusion models as a strategy for enhancing the robustness of machine learning model training. Synthetic data generation plays a pivotal role in mitigating challenges associated with limited labeled datasets, thereby facilitating more effective model training. In this context, we aim to incorporate enhanced data transformation techniques by extending the recent success of few-shot learning and a small amount of data representation in text-to-image latent diffusion models. The optimally tuned model is further used for rendering high-quality skin lesion synthetic data with diverse and realistic characteristics, providing a valuable supplement and diversity to the existing training data. We investigate the impact of incorporating newly generated synthetic data into the training pipeline of state-of-art machine learning models, assessing its effectiveness in enhancing model performance and generalization to unseen real-world data. Our experimental results demonstrate the efficacy of the synthetic data generated through stable diffusion models helps in improving the robustness and adaptability of end-to-end CNN and vision transformer models on two different real-world skin lesion datasets.
- [645] arXiv:2401.05163 (cross-list from cs.CV) [ pdf , other ]
-
Title: MISS: A Generative Pretraining and Finetuning Approach for Med-VQASubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Medical visual question answering (VQA) is a challenging multimodal task, where Vision-Language Pre-training (VLP) models can effectively improve the generalization performance. However, most methods in the medical field treat VQA as an answer classification task which is difficult to transfer to practical application scenarios. Additionally, due to the privacy of medical images and the expensive annotation process, large-scale medical image-text pairs datasets for pretraining are severely lacking. In this paper, we propose a large-scale MultI-task Self-Supervised learning based framework (MISS) for medical VQA tasks. Unlike existing methods, we treat medical VQA as a generative task. We unify the text encoder and multimodal encoder and align image-text features through multi-task learning. Furthermore, we propose a Transfer-and-Caption method that extends the feature space of single-modal image datasets using large language models (LLMs), enabling those traditional medical vision field task data to be applied to VLP. Experiments show that our method achieves excellent results with fewer multimodal datasets and demonstrates the advantages of generative VQA models. The code and model weights will be released upon the paper's acceptance.
- [646] arXiv:2401.05176 (cross-list from cs.CL) [ pdf , other ]
-
Title: Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine TranslationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models have demonstrated parallel and even superior translation performance compared to neural machine translation (NMT) systems. However, existing comparative studies between them mainly rely on automated metrics, raising questions into the feasibility of these metrics and their alignment with human judgment. The present study investigates the convergences and divergences between automated metrics and human evaluation in assessing the quality of machine translation from ChatGPT and three NMT systems. To perform automatic assessment, four automated metrics are employed, while human evaluation incorporates the DQF-MQM error typology and six rubrics. Notably, automatic assessment and human evaluation converge in measuring formal fidelity (e.g., error rates), but diverge when evaluating semantic and pragmatic fidelity, with automated metrics failing to capture the improvement of ChatGPT's translation brought by prompt engineering. These results underscore the indispensable role of human judgment in evaluating the performance of advanced translation tools at the current stage.
- [647] arXiv:2401.05193 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Experiment Planning with Function ApproximationComments: 10 pages mainSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied, results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.
- [648] arXiv:2401.05194 (cross-list from cs.RO) [ pdf , other ]
-
Title: Modelling, Positioning, and Deep Reinforcement Learning Path Tracking Control of Scaled Robotic Vehicles: Design and Experimental ValidationAuthors: Carmine Caponio , Pietro Stano , Raffaele Carli , Ignazio Olivieri , Daniele Ragone , Aldo Sorniotti , Umberto MontanaroComments: Under review on IEEE TransactionsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Mobile robotic systems are becoming increasingly popular. These systems are used in various indoor applications, raging from warehousing and manufacturing to test benches for assessment of advanced control strategies, such as artificial intelligence (AI)-based control solutions, just to name a few. Scaled robotic cars are commonly equipped with a hierarchical control acthiecture that includes tasks dedicated to vehicle state estimation and control. This paper covers both aspects by proposing (i) a federeted extended Kalman filter (FEKF), and (ii) a novel deep reinforcement learning (DRL) path tracking controller trained via an expert demonstrator to expedite the learning phase and increase robustess to the simulation-to-reality gap. The paper also presents the formulation of a vehicle model along with an effective yet simple procedure for identifying tis paramters. The experimentally validated model is used for (i) supporting the design of the FEKF and (ii) serving as a digital twin for training the proposed DRL-based path tracking algorithm. Experimental results confirm the ability of the FEKF to improve the estimate of the mobile robot's position. Furthermore, the effectiveness of the DRL path tracking strateguy is experimentally tested along manoeuvres not considered during training, showing also the ability of the AI-based solution to outpeform model-based control strategies and the demonstrator. The comparison with benchmraking controllers is quantitavely evalueted through a set of key performance indicators.
- [649] arXiv:2401.05199 (cross-list from cs.CL) [ pdf , other ]
-
Title: Monte Carlo Tree Search for Recipe Generation using GPT-2Comments: 10 pages, 1 figure, ICCC 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Automatic food recipe generation methods provide a creative tool for chefs to explore and to create new, and interesting culinary delights. Given the recent success of large language models (LLMs), they have the potential to create new recipes that can meet individual preferences, dietary constraints, and adapt to what is in your refrigerator. Existing research on using LLMs to generate recipes has shown that LLMs can be finetuned to generate realistic-sounding recipes. However, on close examination, these generated recipes often fail to meet basic requirements like including chicken as an ingredient in chicken dishes. In this paper, we propose RecipeMC, a text generation method using GPT-2 that relies on Monte Carlo Tree Search (MCTS). RecipeMC allows us to define reward functions to put soft constraints on text generation and thus improve the credibility of the generated recipes. Our results show that human evaluators prefer recipes generated with RecipeMC more often than recipes generated with other baseline methods when compared with real recipes.
- [650] arXiv:2401.05200 (cross-list from cs.HC) [ pdf , other ]
-
Title: Knowledge Sharing in Manufacturing using Large Language Models: User Evaluation and Model BenchmarkingAuthors: Samuel Kernan Freire , Chaofan Wang , Mina Foosherian , Stefan Wellsandt , Santiago Ruiz-Arenas , Evangelos NiforatosComments: 11 pages, 3 figures, and 1 table. Under reviewSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Recent advances in natural language processing enable more intelligent ways to support knowledge sharing in factories. In manufacturing, operating production lines has become increasingly knowledge-intensive, putting strain on a factory's capacity to train and support new operators. This paper introduces a Large Language Model (LLM)-based system designed to retrieve information from the extensive knowledge contained in factory documentation and knowledge shared by expert operators. The system aims to efficiently answer queries from operators and facilitate the sharing of new knowledge. We conducted a user study at a factory to assess its potential impact and adoption, eliciting several perceived benefits, namely, enabling quicker information retrieval and more efficient resolution of issues. However, the study also highlighted a preference for learning from a human expert when such an option is available. Furthermore, we benchmarked several commercial and open-sourced LLMs for this system. The current state-of-the-art model, GPT-4, consistently outperformed its counterparts, with open-source models trailing closely, presenting an attractive option given their data privacy and customization benefits. In summary, this work offers preliminary insights and a system design for factories considering using LLM tools for knowledge management.
- [651] arXiv:2401.05204 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Novel Prompt-tuning Method: Incorporating Scenario-specific Concepts into a VerbalizerSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The verbalizer, which serves to map label words to class labels, is an essential component of prompt-tuning. In this paper, we present a novel approach to constructing verbalizers. While existing methods for verbalizer construction mainly rely on augmenting and refining sets of synonyms or related words based on class names, this paradigm suffers from a narrow perspective and lack of abstraction, resulting in limited coverage and high bias in the label-word space. To address this issue, we propose a label-word construction process that incorporates scenario-specific concepts. Specifically, we extract rich concepts from task-specific scenarios as label-word candidates and then develop a novel cascade calibration module to refine the candidates into a set of label words for each class. We evaluate the effectiveness of our proposed approach through extensive experiments on {five} widely used datasets for zero-shot text classification. The results demonstrate that our method outperforms existing methods and achieves state-of-the-art results.
- [652] arXiv:2401.05215 (cross-list from cs.CL) [ pdf , other ]
-
Title: Pre-trained Large Language Models for Financial Sentiment AnalysisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Financial sentiment analysis refers to classifying financial text contents into sentiment categories (e.g. positive, negative, and neutral). In this paper, we focus on the classification of financial news title, which is a challenging task due to a lack of large amount of training samples. To overcome this difficulty, we propose to adapt the pretrained large language models (LLMs) [1, 2, 3] to solve this problem. The LLMs, which are trained from huge amount of text corpora,have an advantage in text understanding and can be effectively adapted to domain-specific task while requiring very few amount of training samples. In particular, we adapt the open-source Llama2-7B model (2023) with the supervised fine-tuning (SFT) technique [4]. Experimental evaluation shows that even with the 7B model (which is relatively small for LLMs), our approach significantly outperforms the previous state-of-the-art algorithms.
- [653] arXiv:2401.05219 (cross-list from cs.CE) [ pdf , other ]
-
Title: Distributed Monitoring for Data Distribution Shifts in Edge-ML Fraud DetectionSubjects: Computational Engineering, Finance, and Science (cs.CE) ; Artificial Intelligence (cs.AI)
Abstract: The digital era has seen a marked increase in financial fraud. edge ML emerged as a promising solution for smartphone payment services fraud detection, enabling the deployment of ML models directly on edge devices. This approach enables a more personalized real-time fraud detection. However, a significant gap in current research is the lack of a robust system for monitoring data distribution shifts in these distributed edge ML applications. Our work bridges this gap by introducing a novel open-source framework designed for continuous monitoring of data distribution shifts on a network of edge devices. Our system includes an innovative calculation of the Kolmogorov-Smirnov (KS) test over a distributed network of edge devices, enabling efficient and accurate monitoring of users behavior shifts. We comprehensively evaluate the proposed framework employing both real-world and synthetic financial transaction datasets and demonstrate the framework's effectiveness.
- [654] arXiv:2401.05224 (cross-list from cs.CV) [ pdf , other ]
-
Title: Do Vision and Language Encoders Represent the World Similarly?Authors: Mayug Maniparambil , Raiymbek Akshulakov , Yasser Abdelaziz Dahou Djilali , Sanath Narayan , Mohamed El Amine Seddik , Karttikeya Mangalam , Noel E. O'ConnorComments: Accepted CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Aligned text-image encoders such as CLIP have become the de facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an alignment exist between uni-modal vision and language encoders since they fundamentally represent the same physical world? Analyzing the latent spaces structure of vision and language models on image-caption benchmarks using the Centered Kernel Alignment (CKA), we find that the representation spaces of unaligned and aligned encoders are semantically similar. In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training. We frame this as a seeded graph-matching problem exploiting the semantic similarity between graphs and propose two methods - a Fast Quadratic Assignment Problem optimization, and a novel localized CKA metric-based matching/retrieval. We demonstrate the effectiveness of this on several downstream tasks including cross-lingual, cross-domain caption matching and image classification. Code available at this http URL .
- [655] arXiv:2401.05251 (cross-list from cs.LG) [ pdf , other ]
-
Title: ReACT: Reinforcement Learning for Controller Parametrization using B-Spline GeometriesAuthors: Thomas Rudolf , Daniel Flögel , Tobias Schürmann , Simon Süß , Stefan Schwab , Sören HohmannComments: 7 pages, 7 figures, accepted at the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USASubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Robust and performant controllers are essential for industrial applications. However, deriving controller parameters for complex and nonlinear systems is challenging and time-consuming. To facilitate automatic controller parametrization, this work presents a novel approach using deep reinforcement learning (DRL) with N-dimensional B-spline geometries (BSGs). We focus on the control of parameter-variant systems, a class of systems with complex behavior which depends on the operating conditions. For this system class, gain-scheduling control structures are widely used in applications across industries due to well-known design principles. Facilitating the expensive controller parametrization task regarding these control structures, we deploy an DRL agent. Based on control system observations, the agent autonomously decides how to adapt the controller parameters. We make the adaptation process more efficient by introducing BSGs to map the controller parameters which may depend on numerous operating conditions. To preprocess time-series data and extract a fixed-length feature vector, we use a long short-term memory (LSTM) neural networks. Furthermore, this work contributes actor regularizations that are relevant to real-world environments which differ from training. Accordingly, we apply dropout layer normalization to the actor and critic networks of the truncated quantile critic (TQC) algorithm. To show our approach's working principle and effectiveness, we train and evaluate the DRL agent on the parametrization task of an industrial control structure with parameter lookup tables.
- [656] arXiv:2401.05268 (cross-list from cs.CL) [ pdf , other ]
-
Title: AUTOACT: Automatic Agent Learning from Scratch via Self-PlanningAuthors: Shuofei Qiao , Ningyu Zhang , Runnan Fang , Yujie Luo , Wangchunshu Zhou , Yuchen Eleanor Jiang , Chengfei Lv , Huajun ChenComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: Language agents have achieved considerable performance on various complex question-answering tasks. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agent learning framework that does not rely on large-scale annotated data and synthetic trajectories from closed-source models (e.g., GPT-4). Given limited data with a tool library, AutoAct first automatically synthesizes planning trajectories without any assistance from humans or strong closed-source models. Then, AutoAct leverages a division-of-labor strategy to automatically differentiate based on the target task information and synthesized trajectories, producing a sub-agent group to complete the task. We conduct comprehensive experiments with different LLMs, which demonstrates that AutoAct yields better or parallel performance compared to various strong baselines. Further analysis demonstrates the effectiveness of the division-of-labor strategy, with the trajectory quality generated by AutoAct significantly outperforming that of others. Code will be available at this https URL .
- [657] arXiv:2401.05273 (cross-list from cs.CL) [ pdf , other ]
-
Title: INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and ChallengesAuthors: Jayr Pereira , Andre Assumpcao , Julio Trecenti , Luiz Airosa , Caio Lente , Jhonatan Cléto , Guilherme Dobins , Rodrigo Nogueira , Luis Mitchell , Roberto LotufoSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces INACIA (Instrução Assistida com Inteligência Artificial), a groundbreaking system designed to integrate Large Language Models (LLMs) into the operational framework of Brazilian Federal Court of Accounts (TCU). The system automates various stages of case analysis, including basic information extraction, admissibility examination, Periculum in mora and Fumus boni iuris analyses, and recommendations generation. Through a series of experiments, we demonstrate INACIA's potential in extracting relevant information from case documents, evaluating its legal plausibility, and formulating propositions for judicial decision-making. Utilizing a validation dataset alongside LLMs, our evaluation methodology presents a novel approach to assessing system performance, correlating highly with human judgment. These results underscore INACIA's potential in complex legal task handling while also acknowledging the current limitations. This study discusses possible improvements and the broader implications of applying AI in legal contexts, suggesting that INACIA represents a significant step towards integrating AI in legal systems globally, albeit with cautious optimism grounded in the empirical findings.
- [658] arXiv:2401.05300 (cross-list from cs.CL) [ pdf , other ]
-
Title: I am a Strange Dataset: Metalinguistic Tests for Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Statements involving metalinguistic self-reference ("This paper has six sections.") are prevalent in many domains. Can large language models (LLMs) handle such language? In this paper, we present "I am a Strange Dataset", a new dataset for addressing this question. There are two subtasks: generation and verification. In generation, models continue statements like "The penultimate word in this sentence is" (where a correct continuation is "is"). In verification, models judge the truth of statements like "The penultimate word in this sentence is sentence." (false). We also provide minimally different metalinguistic non-self-reference examples to complement the main dataset by probing for whether models can handle metalinguistic language at all. The dataset is hand-crafted by experts and validated by non-expert annotators. We test a variety of open-source LLMs (7B to 70B parameters) as well as closed-source LLMs through APIs. All models perform close to chance across both subtasks and even on the non-self-referential metalinguistic control data, though we find some steady improvement with model scale. GPT 4 is the only model to consistently do significantly better than chance, and it is still only in the 60% range, while our untrained human annotators score well in the 89-93% range. The dataset and evaluation toolkit are available at this https URL .
- [659] arXiv:2401.05302 (cross-list from cs.RO) [ pdf , other ]
-
Title: Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?Comments: Accepted in alt.HRI 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Large Language Models have shown exceptional generative abilities in various natural language and generation tasks. However, possible anthropomorphization and leniency towards failure cases have propelled discussions on emergent abilities of Large Language Models especially on Theory of Mind (ToM) abilities in Large Language Models. While several false-belief tests exists to verify the ability to infer and maintain mental models of another entity, we study a special application of ToM abilities that has higher stakes and possibly irreversible consequences : Human Robot Interaction. In this work, we explore the task of Perceived Behavior Recognition, where a robot employs a Large Language Model (LLM) to assess the robot's generated behavior in a manner similar to human observer. We focus on four behavior types, namely - explicable, legible, predictable, and obfuscatory behavior which have been extensively used to synthesize interpretable robot behaviors. The LLMs goal is, therefore to be a human proxy to the agent, and to answer how a certain agent behavior would be perceived by the human in the loop, for example "Given a robot's behavior X, would the human observer find it explicable?". We conduct a human subject study to verify that the users are able to correctly answer such a question in the curated situations (robot setting and plan) across five domains. A first analysis of the belief test yields extremely positive results inflating ones expectations of LLMs possessing ToM abilities. We then propose and perform a suite of perturbation tests which breaks this illusion, i.e. Inconsistent Belief, Uninformative Context and Conviction Test. We conclude that, the high score of LLMs on vanilla prompts showcases its potential use in HRI settings, however to possess ToM demands invariance to trivial or irrelevant perturbations in the context which LLMs lack.
- [660] arXiv:2401.05346 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Task tree retrieval from FOON using search algorithmsAuthors: Amitha AttapuComments: 3 pages, 3 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Robots can be very useful to automate tasks and reduce the human effort required. But for the robot to know, how to perform tasks, we need to give it a clear set of steps to follow. It is nearly impossible to provide a robot with instructions for every possible task. Therefore we have a Universal Functional object-oriented network (FOON) which was created and expanded and has a lot of existing recipe information [1]. But certain tasks are complicated for robots to perform and similarly, some tasks are complicated for humans to perform. Therefore weights have been added to functional units to represent the chance of successful execution of the motion by the robot [2]. Given a set of kitchen items and a goal node, using Universal FOON, a robot must be able to determine if the required items are present in the kitchen, and if yes, get the steps to convert the required kitchen items to the goal node. Now through this paper, we use two algorithms (IDS and GBFS) to retrieve a task tree (if possible) for a goal node and a given set of kitchen items. The following would be the different parts of the paper: Section II FOON creation, where we will discuss the different terminologies related to FOON and visualization of FOON. In Section III Methodology we discuss the IDS and GBFS search algorithms and the two different heuristics implemented and used in GBFS. In Section IV Experiment/Discussion, we compare the performance of different algorithms. In the final section V, we specify the references of the papers that have been cited.
- [661] arXiv:2401.05350 (cross-list from cs.NE) [ pdf , other ]
-
Title: Adaptive operator selection utilising generalised experienceComments: Submitted to journal for publications, under reviewSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Optimisation problems, particularly combinatorial optimisation problems, are difficult to solve due to their complexity and hardness. Such problems have been successfully solved by evolutionary and swarm intelligence algorithms, especially in binary format. However, the approximation may suffer due to the the issues in balance between exploration and exploitation activities (EvE), which remain as the major challenge in this context. Although the complementary usage of multiple operators is becoming more popular for managing EvE with adaptive operator selection schemes, a bespoke adaptive selection system is still an important topic in research. Reinforcement Learning (RL) has recently been proposed as a way to customise and shape up a highly effective adaptive selection system. However, it is still challenging to handle the problem in terms of scalability. This paper proposes and assesses a RL-based novel approach to help develop a generalised framework for gaining, processing, and utilising the experiences for both the immediate and future use. The experimental results support the proposed approach with a certain level of success.
- [662] arXiv:2401.05355 (cross-list from cs.CV) [ pdf , other ]
-
Title: Developing a Resource-Constraint EdgeAI model for Surface Defect DetectionComments: Keywords: Lightweight Edge AI, Resource-constraint ML, Surface Defect DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Resource constraints have restricted several EdgeAI applications to machine learning inference approaches, where models are trained on the cloud and deployed to the edge device. This poses challenges such as bandwidth, latency, and privacy associated with storing data off-site for model building. Training on the edge device can overcome these challenges by eliminating the need to transfer data to another device for storage and model development. On-device training also provides robustness to data variations as models can be retrained on newly acquired data to improve performance. We, therefore, propose a lightweight EdgeAI architecture modified from Xception, for on-device training in a resource-constraint edge environment. We evaluate our model on a PCB defect detection task and compare its performance against existing lightweight models - MobileNetV2, EfficientNetV2B0, and MobileViT-XXS. The results of our experiment show that our model has a remarkable performance with a test accuracy of 73.45% without pre-training. This is comparable to the test accuracy of non-pre-trained MobileViT-XXS (75.40%) and much better than other non-pre-trained models (MobileNetV2 - 50.05%, EfficientNetV2B0 - 54.30%). The test accuracy of our model without pre-training is comparable to pre-trained MobileNetV2 model - 75.45% and better than pre-trained EfficientNetV2B0 model - 58.10%. In terms of memory efficiency, our model performs better than EfficientNetV2B0 and MobileViT-XXS. We find that the resource efficiency of machine learning models does not solely depend on the number of parameters but also depends on architectural considerations. Our method can be applied to other resource-constraint applications while maintaining significant performance.
- [663] arXiv:2401.05362 (cross-list from cs.CV) [ pdf , other ]
-
Title: DualTeacher: Bridging Coexistence of Unlabelled Classes for Semi-supervised Incremental Object DetectionAuthors: Ziqi Yuan , Liyuan Wang , Wenbo Ding , Xingxing Zhang , Jiachen Zhong , Jianyong Ai , Jianmin Li , Jun ZhuSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In real-world applications, an object detector often encounters object instances from new classes and needs to accommodate them effectively. Previous work formulated this critical problem as incremental object detection (IOD), which assumes the object instances of new classes to be fully annotated in incremental data. However, as supervisory signals are usually rare and expensive, the supervised IOD may not be practical for implementation. In this work, we consider a more realistic setting named semi-supervised IOD (SSIOD), where the object detector needs to learn new classes incrementally from a few labelled data and massive unlabelled data without catastrophic forgetting of old classes. A commonly-used strategy for supervised IOD is to encourage the current model (as a student) to mimic the behavior of the old model (as a teacher), but it generally fails in SSIOD because a dominant number of object instances from old and new classes are coexisting and unlabelled, with the teacher only recognizing a fraction of them. Observing that learning only the classes of interest tends to preclude detection of other classes, we propose to bridge the coexistence of unlabelled classes by constructing two teacher models respectively for old and new classes, and using the concatenation of their predictions to instruct the student. This approach is referred to as DualTeacher, which can serve as a strong baseline for SSIOD with limited resource overhead and no extra hyperparameters. We build various benchmarks for SSIOD and perform extensive experiments to demonstrate the superiority of our approach (e.g., the performance lead is up to 18.28 AP on MS-COCO). Our code is available at \url{ this https URL }.
- [664] arXiv:2401.05373 (cross-list from cs.NE) [ pdf , other ]
-
Title: Dynamic Spiking Graph Neural NetworksSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The integration of Spiking Neural Networks (SNNs) and Graph Neural Networks (GNNs) is gradually attracting attention due to the low power consumption and high efficiency in processing the non-Euclidean data represented by graphs. However, as a common problem, dynamic graph representation learning faces challenges such as high complexity and large memory overheads. Current work often uses SNNs instead of Recurrent Neural Networks (RNNs) by using binary features instead of continuous ones for efficient training, which would overlooks graph structure information and leads to the loss of details during propagation. Additionally, optimizing dynamic spiking models typically requires propagation of information across time steps, which increases memory requirements. To address these challenges, we present a framework named \underline{Dy}namic \underline{S}p\underline{i}king \underline{G}raph \underline{N}eural Networks (\method{}). To mitigate the information loss problem, \method{} propagates early-layer information directly to the last layer for information compensation. To accommodate the memory requirements, we apply the implicit differentiation on the equilibrium state, which does not rely on the exact reverse of the forward computation. While traditional implicit differentiation methods are usually used for static situations, \method{} extends it to the dynamic graph setting. Extensive experiments on three large-scale real-world dynamic graph datasets validate the effectiveness of \method{} on dynamic node classification tasks with lower computational costs.
- [665] arXiv:2401.05375 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Classical Sorting Algorithms as a Model of Morphogenesis: self-sorting arrays reveal unexpected competencies in a minimal model of basal intelligenceSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Multiagent Systems (cs.MA)
Abstract: The emerging field of Diverse Intelligence seeks to identify, formalize, and understand commonalities in behavioral competencies across a wide range of implementations. Especially interesting are simple systems that provide unexpected examples of memory, decision-making, or problem-solving in substrates that at first glance do not appear to be complex enough to implement such capabilities. We seek to develop tools to help understand the minimal requirements for such capabilities, and to learn to recognize and predict basal forms of intelligence in unconventional substrates. Here, we apply novel analyses to the behavior of classical sorting algorithms, short pieces of code which have been studied for many decades. To study these sorting algorithms as a model of biological morphogenesis and its competencies, we break two formerly-ubiquitous assumptions: top-down control (instead, showing how each element within a array of numbers can exert minimal agency and implement sorting policies from the bottom up), and fully reliable hardware (instead, allowing some of the elements to be "damaged" and fail to execute the algorithm). We quantitatively characterize sorting activity as the traversal of a problem space, showing that arrays of autonomous elements sort themselves more reliably and robustly than traditional implementations in the presence of errors. Moreover, we find the ability to temporarily reduce progress in order to navigate around a defect, and unexpected clustering behavior among the elements in chimeric arrays whose elements follow one of two different algorithms. The discovery of emergent problem-solving capacities in simple, familiar algorithms contributes a new perspective to the field of Diverse Intelligence, showing how basal forms of intelligence can emerge in simple systems without being explicitly encoded in their underlying mechanics.
- [666] arXiv:2401.05391 (cross-list from cs.AR) [ pdf , ps , other ]
-
Title: Efficient LLM inference solution on Intel GPUAuthors: Hui Wu , Yi Gan , Feng Yuan , Jing Ma , Wei Zhu , Yutao Xu , Hong Zhu , Yuhua Zhu , Xiaoli Liu , Jinghui GuSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: Transformer based Large Language Models (LLMs) have been widely used in many fields, and the efficiency of LLM inference becomes hot topic in real applications. However, LLMs are usually complicatedly designed in model structure with massive operations and perform inference in the auto-regressive mode, making it a challenging task to design a system with high efficiency.
In this paper, we propose an efficient LLM inference solution with low latency and high throughput. Firstly, we simplify the LLM decoder layer by fusing data movement and element-wise operations to reduce the memory access frequency and lower system latency. We also propose a segment KV cache policy to keep key/value of the request and response tokens in separate physical memory for effective device memory management, helping enlarge the runtime batch size and improve system throughput. A customized Scaled-Dot-Product-Attention kernel is designed to match our fusion policy based on the segment KV cache solution. We implement our LLM inference solution on Intel GPU and publish it publicly. Compared with the standard HuggingFace implementation, the proposed solution achieves up to 7x lower token latency and 27x higher throughput for some popular LLMs on Intel GPU. - [667] arXiv:2401.05392 (cross-list from cs.CV) [ pdf , other ]
-
Title: AT-2FF: Adaptive Type-2 Fuzzy Filter for De-noising Images Corrupted with Salt-and-PepperAuthors: Vikas SinghSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Noise is inevitably common in digital images, leading to visual image deterioration. Therefore, a suitable filtering method is required to lessen the noise while preserving the image features (edges, corners, etc.). This paper presents the efficient type-2 fuzzy weighted mean filter with an adaptive threshold to remove the SAP noise. The present filter has two primary steps: The first stage categorizes images as lightly, medium, and heavily corrupted based on an adaptive threshold by comparing the M-ALD of processed pixels with the upper and lower MF of the type-2 fuzzy identifier. The second stage eliminates corrupted pixels by computing the appropriate weight using GMF with the mean and variance of the uncorrupted pixels in the filter window. Simulation results vividly show that the obtained denoised images preserve image features, i.e., edges, corners, and other sharp structures, compared with different filtering methods.
- [668] arXiv:2401.05398 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: GeoAI in Social ScienceAuthors: Wenwen LiComments: Artificial Intelligence; social science; deep learning; convergence; knowledge graphJournal-ref: Handbook of Spatial Analysis in the Social Sciences, 291 (2022)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: GeoAI, or geospatial artificial intelligence, is an exciting new area that leverages artificial intelligence (AI), geospatial big data, and massive computing power to solve problems with high automation and intelligence. This paper reviews the progress of AI in social science research, highlighting important advancements in using GeoAI to fill critical data and knowledge gaps. It also discusses the importance of breaking down data silos, accelerating convergence among GeoAI research methods, as well as moving GeoAI beyond geospatial benefits.
- [669] arXiv:2401.05399 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Automated Assessment of Students' Code Comprehension using LLMsSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Assessing student's answers and in particular natural language answers is a crucial challenge in the field of education. Advances in machine learning, including transformer-based models such as Large Language Models(LLMs), have led to significant progress in various natural language tasks. Nevertheless, amidst the growing trend of evaluating LLMs across diverse tasks, evaluating LLMs in the realm of automated answer assesment has not received much attention. To address this gap, we explore the potential of using LLMs for automated assessment of student's short and open-ended answer. Particularly, we use LLMs to compare students' explanations with expert explanations in the context of line-by-line explanations of computer programs.
For comparison purposes, we assess both Large Language Models (LLMs) and encoder-based Semantic Textual Similarity (STS) models in the context of assessing the correctness of students' explanation of computer code. Our findings indicate that LLMs, when prompted in few-shot and chain-of-thought setting perform comparable to fine-tuned encoder-based models in evaluating students' short answers in programming domain. - [670] arXiv:2401.05400 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Collaborative Learning with Artificial Intelligence Speakers (CLAIS): Pre-Service Elementary Science Teachers' Responses to the PrototypeSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This research aims to demonstrate that AI can function not only as a tool for learning, but also as an intelligent agent with which humans can engage in collaborative learning (CL) to change epistemic practices in science classrooms. We adopted a design and development research approach, following the Analysis, Design, Development, Implementation and Evaluation (ADDIE) model, to prototype a tangible instructional system called Collaborative Learning with AI Speakers (CLAIS). The CLAIS system is designed to have 3-4 human learners join an AI speaker to form a small group, where humans and AI are considered as peers participating in the Jigsaw learning process. The development was carried out using the NUGU AI speaker platform. The CLAIS system was successfully implemented in a Science Education course session with 15 pre-service elementary science teachers. The participants evaluated the CLAIS system through mixed methods surveys as teachers, learners, peers, and users. Quantitative data showed that the participants' Intelligent-Technological, Pedagogical, And Content Knowledge was significantly increased after the CLAIS session, the perception of the CLAIS learning experience was positive, the peer assessment on AI speakers and human peers was different, and the user experience was ambivalent. Qualitative data showed that the participants anticipated future changes in the epistemic process in science classrooms, while acknowledging technical issues such as speech recognition performance and response latency. This study highlights the potential of Human-AI Collaboration for knowledge co-construction in authentic classroom settings and exemplify how AI could shape the future landscape of epistemic practices in the classroom.
- [671] arXiv:2401.05401 (cross-list from cs.CV) [ pdf , other ]
-
Title: Domain Similarity-Perceived Label Assignment for Domain Generalized Underwater Object DetectionComments: 10 pages,7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The inherent characteristics and light fluctuations of water bodies give rise to the huge difference between different layers and regions in underwater environments. When the test set is collected in a different marine area from the training set, the issue of domain shift emerges, significantly compromising the model's ability to generalize. The Domain Adversarial Learning (DAL) training strategy has been previously utilized to tackle such challenges. However, DAL heavily depends on manually one-hot domain labels, which implies no difference among the samples in the same domain. Such an assumption results in the instability of DAL. This paper introduces the concept of Domain Similarity-Perceived Label Assignment (DSP). The domain label for each image is regarded as its similarity to the specified domains. Through domain-specific data augmentation techniques, we achieved state-of-the-art results on the underwater cross-domain object detection benchmark S-UODAC2020. Furthermore, we validated the effectiveness of our method in the Cityscapes dataset.
- [672] arXiv:2401.05403 (cross-list from cs.CY) [ pdf , other ]
-
Title: The Key Artificial Intelligence Technologies in Early Childhood Education: A ReviewComments: 39 pages, 9 figures, 4 tablesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Artificial Intelligence (AI) technologies have been applied in various domains, including early childhood education (ECE). Integration of AI educational technology is a recent significant trend in ECE. Currently, there are more and more studies of AI in ECE. To date, there is a lack of survey articles that discuss the studies of AI in ECE. In this paper, we provide an up-to-date and in-depth overview of the key AI technologies in ECE that provides a historical perspective, summarizes the representative works, outlines open questions, discusses the trends and challenges through a detailed bibliometric analysis, and provides insightful recommendations for future research. We mainly discuss the studies that apply AI-based robots and AI technologies to ECE, including improving the social interaction of children with an autism spectrum disorder. This paper significantly contributes to provide an up-to-date and in-depth survey that is suitable as introductory material for beginners to AI in ECE, as well as supplementary material for advanced users.
- [673] arXiv:2401.05412 (cross-list from cs.CV) [ pdf , other ]
-
Title: Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual SemanticsComments: Accepted by AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMUs data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of multiple sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.
- [674] arXiv:2401.05432 (cross-list from cs.LG) [ pdf , other ]
-
Title: TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: As deep neural networks and the datasets used to train them get larger, the default approach to integrating them into research and commercial projects is to download a pre-trained model and fine tune it. But these models can have uncertain provenance, opening up the possibility that they embed hidden malicious behavior such as trojans or backdoors, where small changes to an input (triggers) can cause the model to produce incorrect outputs (e.g., to misclassify). This paper introduces a novel approach to backdoor detection that uses two tensor decomposition methods applied to network activations. This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time, working across a wide variety of network architectures, making no assumptions about the nature of triggers used to alter network behavior, and being computationally efficient. We provide a detailed description of the detection pipeline along with results on models trained on the MNIST digit dataset, CIFAR-10 dataset, and two difficult datasets from NIST's TrojAI competition. These results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.
- [675] arXiv:2401.05433 (cross-list from cs.CL) [ pdf , other ]
-
Title: Enhancing Essay Scoring with Adversarial Weights Perturbation and Metric-specific AttentionPoolingComments: This article was accepted by 2023 International Conference on Information Network and Computer Communications(INCC)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The objective of this study is to improve automated feedback tools designed for English Language Learners (ELLs) through the utilization of data science techniques encompassing machine learning, natural language processing, and educational data analytics. Automated essay scoring (AES) research has made strides in evaluating written essays, but it often overlooks the specific needs of English Language Learners (ELLs) in language development. This study explores the application of BERT-related techniques to enhance the assessment of ELLs' writing proficiency within AES.
To address the specific needs of ELLs, we propose the use of DeBERTa, a state-of-the-art neural language model, for improving automated feedback tools. DeBERTa, pretrained on large text corpora using self-supervised learning, learns universal language representations adaptable to various natural language understanding tasks. The model incorporates several innovative techniques, including adversarial training through Adversarial Weights Perturbation (AWP) and Metric-specific AttentionPooling (6 kinds of AP) for each label in the competition.
The primary focus of this research is to investigate the impact of hyperparameters, particularly the adversarial learning rate, on the performance of the model. By fine-tuning the hyperparameter tuning process, including the influence of 6AP and AWP, the resulting models can provide more accurate evaluations of language proficiency and support tailored learning tasks for ELLs. This work has the potential to significantly benefit ELLs by improving their English language proficiency and facilitating their educational journey. - [676] arXiv:2401.05442 (cross-list from cs.LG) [ pdf , other ]
-
Title: Functional Graphical Models: Structure Enables Offline Data-Driven OptimizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those in standard prediction problems, since we need models that successfully predict the performance of new designs that are better than the best designs seen in the training set. It is not clear theoretically when existing approaches can even perform better than the naive approach that simply selects the best design in the dataset. In this paper, we study how structure can enable sample-efficient data-driven optimization. To formalize the notion of structure, we introduce functional graphical models (FGMs) and show theoretically how they can provide for principled data-driven optimization by decomposing the original high-dimensional optimization problem into smaller sub-problems. This allows us to derive much more practical regret bounds for DDO, and the result implies that DDO with FGMs can achieve nearly optimal designs in situations where naive approaches fail due to insufficient coverage of the offline data. We further present a data-driven optimization algorithm that inferes the FGM structure itself, either over the original input variables or a latent variable representation of the inputs.
- [677] arXiv:2401.05443 (cross-list from cs.SE) [ pdf , other ]
-
Title: LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control SystemsAuthors: Mohamad Fakih , Rahul Dharmaji , Yasamin Moghaddas , Gustavo Quiros Araya , Oluwatosin Ogundare , Mohammad Abdullah Al FaruqueComments: 12 pages; 8 figures; Appearing in the 46th International Conference on Software Engineering: Software Engineering in Practice; for demo website, see this https URLSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Abstract: Although Large Language Models (LLMs) have established pre-dominance in automated code generation, they are not devoid of shortcomings. The pertinent issues primarily relate to the absence of execution guarantees for generated code, a lack of explainability, and suboptimal support for essential but niche programming languages. State-of-the-art LLMs such as GPT-4 and LLaMa2 fail to produce valid programs for Industrial Control Systems (ICS) operated by Programmable Logic Controllers (PLCs). We propose LLM4PLC, a user-guided iterative pipeline leveraging user feedback and external verification tools including grammar checkers, compilers and SMV verifiers to guide the LLM's generation. We further enhance the generation potential of LLM by employing Prompt Engineering and model fine-tuning through the creation and usage of LoRAs. We validate this system using a FischerTechnik Manufacturing TestBed (MFTB), illustrating how LLMs can evolve from generating structurally flawed code to producing verifiably correct programs for industrial applications. We run a complete test suite on GPT-3.5, GPT-4, Code Llama-7B, a fine-tuned Code Llama-7B model, Code Llama-34B, and a fine-tuned Code Llama-34B model. The proposed pipeline improved the generation success rate from 47% to 72%, and the Survey-of-Experts code quality from 2.25/10 to 7.75/10. To promote open research, we share the complete experimental setup, the LLM Fine-Tuning Weights, and the video demonstrations of the different programs on our dedicated webpage.
- [678] arXiv:2401.05444 (cross-list from cs.NE) [ pdf , other ]
-
Title: Fully Spiking Actor Network with Intra-layer Connections for Reinforcement LearningComments: 13 pages, 6 figuresSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: With the help of special neuromorphic hardware, spiking neural networks (SNNs) are expected to realize artificial intelligence (AI) with less energy consumption. It provides a promising energy-efficient way for realistic control tasks by combining SNNs with deep reinforcement learning (DRL). In this paper, we focus on the task where the agent needs to learn multi-dimensional deterministic policies to control, which is very common in real scenarios. Recently, the surrogate gradient method has been utilized for training multi-layer SNNs, which allows SNNs to achieve comparable performance with the corresponding deep networks in this task. Most existing spike-based RL methods take the firing rate as the output of SNNs, and convert it to represent continuous action space (i.e., the deterministic policy) through a fully-connected (FC) layer. However, the decimal characteristic of the firing rate brings the floating-point matrix operations to the FC layer, making the whole SNN unable to deploy on the neuromorphic hardware directly. To develop a fully spiking actor network without any floating-point matrix operations, we draw inspiration from the non-spiking interneurons found in insects and employ the membrane voltage of the non-spiking neurons to represent the action. Before the non-spiking neurons, multiple population neurons are introduced to decode different dimensions of actions. Since each population is used to decode a dimension of action, we argue that the neurons in each population should be connected in time domain and space domain. Hence, the intra-layer connections are used in output populations to enhance the representation capacity. Finally, we propose a fully spiking actor network with intra-layer connections (ILC-SAN).
- [679] arXiv:2401.05453 (cross-list from cs.LG) [ pdf , other ]
-
Title: Dimensionality-Aware Outlier Detection: Theoretical and Experimental AnalysisAuthors: Alastair Anderberg , James Bailey , Ricardo J. G. B. Campello , Michael E. Houle , Henrique O. Marques , Miloš Radovanović , Arthur ZimekComments: 13 pages, 3 figures. Extended version of a paper accepted for publication at the SIAM International Conference on Data Mining (SDM24)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We present a nonparametric method for outlier detection that takes full account of local variations in intrinsic dimensionality within the dataset. Using the theory of Local Intrinsic Dimensionality (LID), our 'dimensionality-aware' outlier detection method, DAO, is derived as an estimator of an asymptotic local expected density ratio involving the query point and a close neighbor drawn at random. The dimensionality-aware behavior of DAO is due to its use of local estimation of LID values in a theoretically-justified way. Through comprehensive experimentation on more than 800 synthetic and real datasets, we show that DAO significantly outperforms three popular and important benchmark outlier detection methods: Local Outlier Factor (LOF), Simplified LOF, and kNN.
- [680] arXiv:2401.05458 (cross-list from cs.LG) [ pdf , other ]
-
Title: CoLafier: Collaborative Noisy Label Purifier With Local Intrinsic Dimensionality GuidanceComments: This work is accepted by SIAM International Conference on Data Mining (SDM24)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep neural networks (DNNs) have advanced many machine learning tasks, but their performance is often harmed by noisy labels in real-world data. Addressing this, we introduce CoLafier, a novel approach that uses Local Intrinsic Dimensionality (LID) for learning with noisy labels. CoLafier consists of two subnets: LID-dis and LID-gen. LID-dis is a specialized classifier. Trained with our uniquely crafted scheme, LID-dis consumes both a sample's features and its label to predict the label - which allows it to produce an enhanced internal representation. We observe that LID scores computed from this representation effectively distinguish between correct and incorrect labels across various noise scenarios. In contrast to LID-dis, LID-gen, functioning as a regular classifier, operates solely on the sample's features. During training, CoLafier utilizes two augmented views per instance to feed both subnets. CoLafier considers the LID scores from the two views as produced by LID-dis to assign weights in an adapted loss function for both subnets. Concurrently, LID-gen, serving as classifier, suggests pseudo-labels. LID-dis then processes these pseudo-labels along with two views to derive LID scores. Finally, these LID scores along with the differences in predictions from the two subnets guide the label update decisions. This dual-view and dual-subnet approach enhances the overall reliability of the framework. Upon completion of the training, we deploy the LID-gen subnet of CoLafier as the final classification model. CoLafier demonstrates improved prediction accuracy, surpassing existing methods, particularly under severe label noise. For more details, see the code at this https URL .
- [681] arXiv:2401.05459 (cross-list from cs.HC) [ pdf , other ]
-
Title: Personal LLM Agents: Insights and Survey about the Capability, Efficiency and SecurityAuthors: Yuanchun Li , Hao Wen , Weijun Wang , Xiangyu Li , Yizhen Yuan , Guohong Liu , Jiacheng Liu , Wenxing Xu , Xiang Wang , Yi Sun , Rui Kong , Yile Wang , Hanfei Geng , Jian Luan , Xuefeng Jin , Zilong Ye , Guanjing Xiong , Fan Zhang , Xiang Li , Mengwei Xu , Zhijun Li , Peng Li , Yang Liu , Ya-Qin Zhang , Yunxin LiuComments: this https URLSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing devices have become ubiquitous, greatly expanding the boundaries of IPAs. However, due to the lack of capabilities such as user intent understanding, task planning, tool using, and personal data management etc., existing IPAs still have limited practicality and scalability. Recently, the emergence of foundation models, represented by large language models (LLMs), brings new opportunities for the development of IPAs. With the powerful semantic understanding and reasoning capabilities, LLM can enable intelligent agents to solve complex problems autonomously. In this paper, we focus on Personal LLM Agents, which are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. We envision that Personal LLM Agents will become a major software paradigm for end-users in the upcoming era. To realize this vision, we take the first step to discuss several important questions about Personal LLM Agents, including their architecture, capability, efficiency and security. We start by summarizing the key components and design choices in the architecture of Personal LLM Agents, followed by an in-depth analysis of the opinions collected from domain experts. Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.
- [682] arXiv:2401.05461 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: The two-way knowledge interaction interface between humans and neural networksSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Despite neural networks (NN) have been widely applied in various fields and generally outperforms humans, they still lack interpretability to a certain extent, and humans are unable to intuitively understand the decision logic of NN. This also hinders the knowledge interaction between humans and NN, preventing humans from getting involved to give direct guidance when NN's decisions go wrong. While recent research in explainable AI has achieved interpretability of NN from various perspectives, it has not yet provided effective methods for knowledge exchange between humans and NN. To address this problem, we constructed a two-way interaction interface that uses structured representations of visual concepts and their relationships as the "language" for knowledge exchange between humans and NN. Specifically, NN provide intuitive reasoning explanations to humans based on the class-specific structural concepts graph (C-SCG). On the other hand, humans can modify the biases present in the C-SCG through their prior knowledge and reasoning ability, and thus provide direct knowledge guidance to NN through this interface. Through experimental validation, based on this interaction interface, NN can provide humans with easily understandable explanations of the reasoning process. Furthermore, human involvement and prior knowledge can directly and effectively contribute to enhancing the performance of NN.
- [683] arXiv:2401.05467 (cross-list from cs.LG) [ pdf , other ]
-
Title: Machine Teaching for Building Modular AI Agents based on Zero-shot LearnersComments: 10 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The recent advances in large language models (LLMs) have led to the creation of many modular AI agents. These agents employ LLMs as zero-shot learners to perform sub-tasks in order to solve complex tasks set forth by human users. We propose an approach to enhance the robustness and performance of modular AI agents that utilize LLMs as zero-shot learners. Our iterative machine teaching method offers an efficient way to teach AI agents over time with limited human feedback, addressing the limit posed by the quality of zero-shot learning. We advocate leveraging the data traces from initial deployments and outputs or annotations from the zero-shot learners to train smaller and task-specific substitute models which can reduce both the monetary costs and environmental impact. Our machine teaching process avails human expertise to correct examples with a high likelihood of misannotations. Results on three tasks, common to conversational AI agents, show that close-to-oracle performance can be achieved with supervision on 20-70% of the dataset depending upon the complexity of the task and performance of zero-shot learners.
- [684] arXiv:2401.05468 (cross-list from cs.SI) [ pdf , other ]
-
Title: Introducing New Node Prediction in Graph Mining: Predicting All Links from Isolated Nodes with Graph Neural NetworksSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper introduces a new problem in the field of graph mining and social network analysis called new node prediction. More technically, the task can be categorized as zero-shot out-of-graph all-links prediction. This challenging problem aims to predict all links from a new, isolated, and unobserved node that was previously disconnected from the graph. Unlike classic approaches to link prediction (including few-shot out-of-graph link prediction), this problem presents two key differences: (1) the new node has no existing links from which to extract patterns for new predictions; and (2) the goal is to predict not just one, but all the links of this new node, or at least a significant part of them. Experiments demonstrate that an architecture based on Deep Graph Neural Networks can learn to solve this challenging problem in a bibliographic citation network.
- [685] arXiv:2401.05476 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: CADgpt: Harnessing Natural Language Processing for 3D Modelling to Enhance Computer-Aided Design WorkflowsAuthors: Timo KapsalisComments: 10 pages, 4 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: This paper introduces CADgpt, an innovative plugin integrating Natural Language Processing (NLP) with Rhino3D for enhancing 3D modelling in computer-aided design (CAD) environments. Leveraging OpenAI's GPT-4, CADgpt simplifies the CAD interface, enabling users, particularly beginners, to perform complex 3D modelling tasks through intuitive natural language commands. This approach significantly reduces the learning curve associated with traditional CAD software, fostering a more inclusive and engaging educational environment. The paper discusses CADgpt's technical architecture, including its integration within Rhino3D and the adaptation of GPT-4 capabilities for CAD tasks. It presents case studies demonstrating CADgpt's efficacy in various design scenarios, highlighting its potential to democratise design education by making sophisticated design tools accessible to a broader range of students. The discussion further explores CADgpt's implications for pedagogy and curriculum development, emphasising its role in enhancing creative exploration and conceptual thinking in design education.
Keywords: Natural Language Processing, Computer-Aided Design, 3D Modelling, Design Automation, Design Education, Architectural Education - [686] arXiv:2401.05477 (cross-list from cs.LG) [ pdf , other ]
-
Title: Standardizing Your Training Process for Human Activity Recognition Models: A Comprehensive Review in the Tunable FactorsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, deep learning has emerged as a potent tool across a multitude of domains, leading to a surge in research pertaining to its application in the wearable human activity recognition (WHAR) domain. Despite the rapid development, concerns have been raised about the lack of standardization and consistency in the procedures used for experimental model training, which may affect the reproducibility and reliability of research results. In this paper, we provide an exhaustive review of contemporary deep learning research in the field of WHAR and collate information pertaining to the training procedure employed in various studies. Our findings suggest that a major trend is the lack of detail provided by model training protocols. Besides, to gain a clearer understanding of the impact of missing descriptions, we utilize a control variables approach to assess the impact of key tunable components (e.g., optimization techniques and early stopping criteria) on the inter-subject generalization capabilities of HAR models. With insights from the analyses, we define a novel integrated training procedure tailored to the WHAR model. Empirical results derived using five well-known \ac{whar} benchmark datasets and three classical HAR model architectures demonstrate the effectiveness of our proposed methodology: in particular, there is a significant improvement in macro F1 leave one subject out cross-validation performance.
- [687] arXiv:2401.05478 (cross-list from cs.SI) [ pdf , other ]
-
Title: Population Graph Cross-Network Node Classification for Autism Detection Across Sample GroupsComments: To appear ICDM DMBIH workshop 2023Subjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Graph neural networks (GNN) are a powerful tool for combining imaging and non-imaging medical information for node classification tasks. Cross-network node classification extends GNN techniques to account for domain drift, allowing for node classification on an unlabeled target network. In this paper we present OTGCN, a powerful, novel approach to cross-network node classification. This approach leans on concepts from graph convolutional networks to harness insights from graph data structures while simultaneously applying strategies rooted in optimal transport to correct for the domain drift that can occur between samples from different data collection sites. This blended approach provides a practical solution for scenarios with many distinct forms of data collected across different locations and equipment. We demonstrate the effectiveness of this approach at classifying Autism Spectrum Disorder subjects using a blend of imaging and non-imaging data.
- [688] arXiv:2401.05502 (cross-list from cs.DS) [ pdf , ps , other ]
-
Title: Diversity-aware clustering: Computational Complexity and Approximation AlgorithmsComments: Algorithmic Fairness, Fair Clustering, Diversity-aware Clustering, Intersectionaly, Subgroup fairnessSubjects: Data Structures and Algorithms (cs.DS) ; Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (cs.LG)
Abstract: In this work, we study diversity-aware clustering problems where the data points are associated with multiple attributes resulting in intersecting groups. A clustering solution need to ensure that a minimum number of cluster centers are chosen from each group while simultaneously minimizing the clustering objective, which can be either $k$-median, $k$-means or $k$-supplier. We present parameterized approximation algorithms with approximation ratios $1+ \frac{2}{e}$, $1+\frac{8}{e}$ and $3$ for diversity-aware $k$-median, diversity-aware $k$-means and diversity-aware $k$-supplier, respectively. The approximation ratios are tight assuming Gap-ETH and FPT $\neq$ W[2]. For fair $k$-median and fair $k$-means with disjoint faicility groups, we present parameterized approximation algorithm with approximation ratios $1+\frac{2}{e}$ and $1+\frac{8}{e}$, respectively. For fair $k$-supplier with disjoint facility groups, we present a polynomial-time approximation algorithm with factor $3$, improving the previous best known approximation ratio of factor $5$.
- [689] arXiv:2401.05507 (cross-list from cs.CL) [ pdf , other ]
-
Title: InfiAgent-DABench: Evaluating Agents on Data Analysis TasksAuthors: Xueyu Hu , Ziyu Zhao , Shuang Wei , Ziwei Chai , Qianli Ma , Guoyin Wang , Xuwu Wang , Jing Su , Jingjing Xu , Ming Zhu , Yao Cheng , Jianbo Yuan , Jiwei Li , Kun Kuang , Yang Yang , Hongxia Yang , Fei WuComments: 27 pages, 7 figures, work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we introduce InfiAgent-DABench, the first benchmark specifically designed to evaluate LLM-based agents on data analysis tasks. These tasks require agents to end-to-end solving complex tasks by interacting with an execution environment. This benchmark contains DAEval, a dataset consisting of 257 data analysis questions derived from 52 CSV files, and an agent framework which incorporates LLMs to serve as data analysis agents for both serving and evaluation. Since data analysis questions are often open-ended and hard to evaluate without human supervision, we adopt a format-prompting technique to convert each question into a closed-form format so that they can be automatically evaluated. Our extensive benchmarking of 34 LLMs uncovers the current challenges encountered in data analysis tasks. In addition, building on top of our agent framework, we develop a specialized agent, DAAgent, which surpasses GPT-3.5 by 3.9% on DABench. Evaluation datasets and toolkits for InfiAgent-DABench are released at this https URL .
- [690] arXiv:2401.05509 (cross-list from cs.CR) [ pdf , other ]
-
Title: Optimized Ensemble Model Towards Secured Industrial IoT DevicesAuthors: MohammadNoor InjadatComments: Accepted and presented in 24th International Arab Conference on Information Technology (ACIT'2023)Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The continued growth in the deployment of Internet-of-Things (IoT) devices has been fueled by the increased connectivity demand, particularly in industrial environments. However, this has led to an increase in the number of network related attacks due to the increased number of potential attack surfaces. Industrial IoT (IIoT) devices are prone to various network related attacks that can have severe consequences on the manufacturing process as well as on the safety of the workers in the manufacturing plant. One promising solution that has emerged in recent years for attack detection is Machine learning (ML). More specifically, ensemble learning models have shown great promise in improving the performance of the underlying ML models. Accordingly, this paper proposes a framework based on the combined use of Bayesian Optimization-Gaussian Process (BO-GP) with an ensemble tree-based learning model to improve the performance of intrusion and attack detection in IIoT environments. The proposed framework's performance is evaluated using the Windows 10 dataset collected by the Cyber Range and IoT labs at University of New South Wales. Experimental results illustrate the improvement in detection accuracy, precision, and F-score when compared to standard tree and ensemble tree models.
- [691] arXiv:2401.05516 (cross-list from cs.CV) [ pdf , other ]
-
Title: FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance FieldsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D neural radiance field, which inherits AdaIN's feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPRF supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPRF also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPRF achieves favorable photorealistic quality 3D scene stylization for large-scale scenes with diverse reference images. Project page: this https URL
- [692] arXiv:2401.05518 (cross-list from cs.LG) [ pdf , other ]
-
Title: Correlated Quantization for Faster Nonconvex Distributed OptimizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Abstract: Quantization (Alistarh et al., 2017) is an important (stochastic) compression technique that reduces the volume of transmitted bits during each communication round in distributed model training. Suresh et al. (2022) introduce correlated quantizers and show their advantages over independent counterparts by analyzing distributed SGD communication complexity. We analyze the forefront distributed non-convex optimization algorithm MARINA (Gorbunov et al., 2022) utilizing the proposed correlated quantizers and show that it outperforms the original MARINA and distributed SGD of Suresh et al. (2022) with regard to the communication complexity. We significantly refine the original analysis of MARINA without any additional assumptions using the weighted Hessian variance (Tyurin et al., 2022), and then we expand the theoretical framework of MARINA to accommodate a substantially broader range of potentially correlated and biased compressors, thus dilating the applicability of the method beyond the conventional independent unbiased compressor setup. Extensive experimental results corroborate our theoretical findings.
- [693] arXiv:2401.05520 (cross-list from cs.CV) [ pdf , other ]
-
Title: From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho HeritageAuthors: Marcellus Amadeus , William Alberto Cruz Castañeda , André Felipe Zanella , Felipe Rodrigues Perche MahlowSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Generative AI has become pervasive in society, witnessing significant advancements in various domains. Particularly in the realm of Text-to-Image (TTI) models, Latent Diffusion Models (LDMs), showcase remarkable capabilities in generating visual content based on textual prompts. This paper addresses the potential of LDMs in representing local cultural concepts, historical figures, and endangered species. In this study, we use the cultural heritage of Rio Grande do Sul (RS), Brazil, as an illustrative case. Our objective is to contribute to the broader understanding of how generative models can help to capture and preserve the cultural and historical identity of regions. The paper outlines the methodology, including subject selection, dataset creation, and the fine-tuning process. The results showcase the images generated, alongside the challenges and feasibility of each concept. In conclusion, this work shows the power of these models to represent and preserve unique aspects of diverse regions and communities.
- [694] arXiv:2401.05521 (cross-list from cs.RO) [ pdf , other ]
-
Title: Current Effect-eliminated Optimal Target Assignment and Motion Planning for a Multi-UUV SystemComments: This paper was accepted by IEEE Transactions on Intelligent Transportation SystemsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: The paper presents an innovative approach (CBNNTAP) that addresses the complexities and challenges introduced by ocean currents when optimizing target assignment and motion planning for a multi-unmanned underwater vehicle (UUV) system. The core of the proposed algorithm involves the integration of several key components. Firstly, it incorporates a bio-inspired neural network-based (BINN) approach which predicts the most efficient paths for individual UUVs while simultaneously ensuring collision avoidance among the vehicles. Secondly, an efficient target assignment component is integrated by considering the path distances determined by the BINN algorithm. In addition, a critical innovation within the CBNNTAP algorithm is its capacity to address the disruptive effects of ocean currents, where an adjustment component is seamlessly integrated to counteract the deviations caused by these currents, which enhances the accuracy of both motion planning and target assignment for the UUVs. The effectiveness of the CBNNTAP algorithm is demonstrated through comprehensive simulation results and the outcomes underscore the superiority of the developed algorithm in nullifying the effects of static and dynamic ocean currents in 2D and 3D scenarios.
- [695] arXiv:2401.05544 (cross-list from cs.CL) [ pdf , other ]
-
Title: CodePrompt: Improving Source Code-Related Classification with Knowledge Features through Prompt LearningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Researchers have explored the potential of utilizing pre-trained language models, such as CodeBERT, to improve source code-related tasks. Previous studies have mainly relied on CodeBERT's text embedding capability and the `[CLS]' sentence embedding information as semantic representations for fine-tuning downstream source code-related tasks. However, these methods require additional neural network layers to extract effective features, resulting in higher computational costs. Furthermore, existing approaches have not leveraged the rich knowledge contained in both source code and related text, which can lead to lower accuracy. This paper presents a novel approach, CodePrompt, which utilizes rich knowledge recalled from a pre-trained model by prompt learning and an attention mechanism to improve source code-related classification tasks. Our approach initially motivates the language model with prompt information to retrieve abundant knowledge associated with the input as representative features, thus avoiding the need for additional neural network layers and reducing computational costs. Subsequently, we employ an attention mechanism to aggregate multiple layers of related knowledge for each task as final features to boost their accuracy. We conducted extensive experiments on four downstream source code-related tasks to evaluate our approach and our results demonstrate that CodePrompt achieves new state-of-the-art performance on the accuracy metric while also exhibiting computation cost-saving capabilities.
- [696] arXiv:2401.05566 (cross-list from cs.CR) [ pdf , other ]
-
Title: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety TrainingAuthors: Evan Hubinger , Carson Denison , Jesse Mu , Mike Lambert , Meg Tong , Monte MacDiarmid , Tamera Lanham , Daniel M. Ziegler , Tim Maxwell , Newton Cheng , Adam Jermyn , Amanda Askell , Ansh Radhakrishnan , Cem Anil , David Duvenaud , Deep Ganguli , Fazl Barez , Jack Clark , Kamal Ndousse , Kshitij Sachan , Michael Sellitto , Mrinank Sharma , Nova DasSarma , Roger Grosse , Shauna Kravec , Yuntao Bai , Zachary Witten , Marina Favaro , Jan Brauner , Holden Karnofsky , Paul Christiano , Samuel R. Bowman , Logan Graham , Jared Kaplan , Sören Mindermann , Ryan Greenblatt , Buck Shlegeris , Nicholas Schiefer , Ethan PerezComments: updated to add missing acknowledgementsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)
Abstract: Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.
- [697] arXiv:2401.05570 (cross-list from cs.CV) [ pdf , other ]
-
Title: Siamese Networks with Soft Labels for Unsupervised Lesion Detection and Patch Pretraining on Screening MammogramsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Self-supervised learning has become a popular way to pretrain a deep learning model and then transfer it to perform downstream tasks. However, most of these methods are developed on large-scale image datasets that contain natural objects with clear textures, outlines, and distinct color contrasts. It remains uncertain whether these methods are equally effective for medical imaging, where the regions of interest often blend subtly and indistinctly with the surrounding tissues. In this study, we propose an alternative method that uses contralateral mammograms to train a neural network to encode similar embeddings when a pair contains both normal images and different embeddings when a pair contains normal and abnormal images. Our approach leverages the natural symmetry of human body as weak labels to learn to distinguish abnormal lesions from background tissues in a fully unsupervised manner. Our findings suggest that it's feasible by incorporating soft labels derived from the Euclidean distances between the embeddings of the image pairs into the Siamese network loss. Our method demonstrates superior performance in mammogram patch classification compared to existing self-supervised learning methods. This approach not only leverages a vast amount of image data effectively but also minimizes reliance on costly labels, a significant advantage particularly in the field of medical imaging.
- [698] arXiv:2401.05572 (cross-list from cs.LG) [ pdf , other ]
-
Title: Innate-Values-driven Reinforcement Learning for Cooperative Multi-Agent SystemsAuthors: Qin YangComments: This paper was accepted by the 38th AAAI 2024 workshop: "Cooperative Multi-Agent Systems Decision-Making and Learning: From Individual Needs to Swarm Intelligence"Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
Abstract: Innate values describe agents' intrinsic motivations, which reflect their inherent interests and preferences to pursue goals and drive them to develop diverse skills satisfying their various needs. The essence of reinforcement learning (RL) is learning from interaction based on reward-driven (such as utilities) behaviors, much like natural agents. It is an excellent model to describe the innate-values-driven (IV) behaviors of AI agents. Especially in multi-agent systems (MAS), building the awareness of AI agents to balance the group utilities and system costs and satisfy group members' needs in their cooperation is a crucial problem for individuals learning to support their community and integrate human society in the long term. This paper proposes a hierarchical compound intrinsic value reinforcement learning model -- innate-values-driven reinforcement learning termed IVRL to describe the complex behaviors of multi-agent interaction in their cooperation. We implement the IVRL architecture in the StarCraft Multi-Agent Challenge (SMAC) environment and compare the cooperative performance within three characteristics of innate value agents (Coward, Neutral, and Reckless) through three benchmark multi-agent RL algorithms: QMIX, IQL, and QTRAN. The results demonstrate that by organizing individual various needs rationally, the group can achieve better performance with lower costs effectively.
- [699] arXiv:2401.05584 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: FourCastNeXt: Optimizing FourCastNet Training for Limited ComputeAuthors: Edison Guo , Maruf Ahmed , Yue Sun , Rui Yang , Harrison Cook , Tennessee Leeuwenburg , Ben EvansComments: Major revision. All prior content (text, figures, table) has been updated. Additionally, new text, tables and figures have been added. Updated title. Updated author listSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: FourCastNeXt is an optimization of FourCastNet - a global machine learning weather forecasting model - that performs with a comparable level of accuracy and can be trained using around 5% of the original FourCastNet computational requirements. This technical report presents strategies for model optimization that maintain similar performance as measured by the root-mean-square error (RMSE) of the modelled variables. By providing a model with very low comparative training costs, FourCastNeXt makes Neural Earth System Modelling much more accessible to researchers looking to conduct training experiments and ablation studies. FourCastNeXt training and inference code are available at this https URL
- [700] arXiv:2401.05596 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine TranslationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Low-resource languages (LRLs) face challenges in supervised neural machine translation due to limited parallel data, prompting research into unsupervised methods. Unsupervised neural machine translation (UNMT) methods, including back-translation, transfer learning, and pivot-based translation, offer practical solutions for LRL translation, but they are hindered by issues like synthetic data noise, language bias, and error propagation, which can potentially be mitigated by Large Language Models (LLMs). LLMs have advanced NMT with in-context learning (ICL) and supervised fine-tuning methods, but insufficient training data results in poor performance in LRLs. We argue that LLMs can mitigate the linguistic noise with auxiliary languages to improve translations in LRLs. In this paper, we propose Probability-driven Meta-graph Prompter (POMP), a novel approach employing a dynamic, sampling-based graph of multiple auxiliary languages to enhance LLMs' translation capabilities for LRLs. POMP involves constructing a directed acyclic meta-graph for each source language, from which we dynamically sample multiple paths to prompt LLMs to mitigate the linguistic noise and improve translations during training. We use the BLEURT metric to evaluate the translations and back-propagate rewards, estimated by scores, to update the probabilities of auxiliary languages in the paths. Our experiments show significant improvements in the translation quality of three LRLs, demonstrating the effectiveness of our approach.
- [701] arXiv:2401.05604 (cross-list from cs.CL) [ pdf , other ]
-
Title: REBUS: A Robust Evaluation Benchmark of Understanding SymbolsAuthors: Andrew Gritsevskiy , Arjun Panickssery , Aaron Kirtland , Derik Kauffman , Hans Gundlach , Irina Gritsevskaya , Joe Cavanagh , Jonathan Chiang , Lydia La Roux , Michelle HungComments: 17 pages, 5 figures. For code, see this http URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Abstract: We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and food. To achieve good performance on the benchmark of identifying the clued word or phrase, models must combine image recognition and string manipulation with hypothesis testing, multi-step reasoning, and an understanding of human cognition, making for a complex, multimodal evaluation of capabilities. We find that proprietary models such as GPT-4V and Gemini Pro significantly outperform all other tested models. However, even the best model has a final accuracy of just 24%, highlighting the need for substantial improvements in reasoning. Further, models rarely understand all parts of a puzzle, and are almost always incapable of retroactively explaining the correct answer. Our benchmark can therefore be used to identify major shortcomings in the knowledge and reasoning of multimodal large language models.
- [702] arXiv:2401.05610 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Q-Learning for Combinatorial OptimizationJournal-ref: GLIndA Workshop NeurIPS 2022Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph-structured data is ubiquitous throughout natural and social sciences, and Graph Neural Networks (GNNs) have recently been shown to be effective at solving prediction and inference problems on graph data. In this paper, we propose and demonstrate that GNNs can be applied to solve Combinatorial Optimization (CO) problems. CO concerns optimizing a function over a discrete solution space that is often intractably large. To learn to solve CO problems, we formulate the optimization process as a sequential decision making problem, where the return is related to how close the candidate solution is to optimality. We use a GNN to learn a policy to iteratively build increasingly promising candidate solutions. We present preliminary evidence that GNNs trained through Q-Learning can solve CO problems with performance approaching state-of-the-art heuristic-based solvers, using only a fraction of the parameters and training time.
- [703] arXiv:2401.05618 (cross-list from cs.CL) [ pdf , other ]
-
Title: The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language ModelsComments: All code, data, and supplemental materials are available on GitHub at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We compared standard CoT and CCoT prompts to see how conciseness impacts response length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4 with a multiple-choice question-and-answer (MCQA) benchmark. CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance. However, on math problems, GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads to an average per-token cost reduction of 22.67%. These results have practical implications for AI systems engineers using LLMs to solve real-world problems with CoT prompt-engineering techniques. In addition, these results provide more general insight for AI researchers studying the emergent behavior of step-by-step reasoning in LLMs.
- [704] arXiv:2401.05631 (cross-list from cs.HC) [ pdf , other ]
-
Title: DrawTalking: Building Interactive Worlds by Sketching and SpeakingSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Graphics (cs.GR)
Abstract: We introduce DrawTalking, an approach to building and controlling interactive worlds by sketching and speaking. It emphasizes user control and flexibility, and gives programming-like capability without requiring code. We built a prototype to demonstrate it. An early open-ended study shows the mechanics resonate and are applicable to many creative-exploratory use cases, with the potential to inspire and inform research in future natural interfaces for creative exploration and authoring.
- [705] arXiv:2401.05667 (cross-list from cs.LG) [ pdf , other ]
-
Title: EsaCL: Efficient Continual Learning of Sparse ModelsComments: SDM 2024 : SIAM International Conference on Data MiningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: A key challenge in the continual learning setting is to efficiently learn a sequence of tasks without forgetting how to perform previously learned tasks. Many existing approaches to this problem work by either retraining the model on previous tasks or by expanding the model to accommodate new tasks. However, these approaches typically suffer from increased storage and computational requirements, a problem that is worsened in the case of sparse models due to need for expensive re-training after sparsification. To address this challenge, we propose a new method for efficient continual learning of sparse models (EsaCL) that can automatically prune redundant parameters without adversely impacting the model's predictive power, and circumvent the need of retraining. We conduct a theoretical analysis of loss landscapes with parameter pruning, and design a directional pruning (SDP) strategy that is informed by the sharpness of the loss function with respect to the model parameters. SDP ensures model with minimal loss of predictive accuracy, accelerating the learning of sparse models at each stage. To accelerate model update, we introduce an intelligent data selection (IDS) strategy that can identify critical instances for estimating loss landscape, yielding substantially improved data efficiency. The results of our experiments show that EsaCL achieves performance that is competitive with the state-of-the-art methods on three continual learning benchmarks, while using substantially reduced memory and computational resources.
- [706] arXiv:2401.05680 (cross-list from cs.CR) [ pdf , other ]
-
Title: Use of Graph Neural Networks in Aiding Defensive Cyber OperationsComments: 35 pages, 9 figures, 8 tablesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: In an increasingly interconnected world, where information is the lifeblood of modern society, regular cyber-attacks sabotage the confidentiality, integrity, and availability of digital systems and information. Additionally, cyber-attacks differ depending on the objective and evolve rapidly to disguise defensive systems. However, a typical cyber-attack demonstrates a series of stages from attack initiation to final resolution, called an attack life cycle. These diverse characteristics and the relentless evolution of cyber attacks have led cyber defense to adopt modern approaches like Machine Learning to bolster defensive measures and break the attack life cycle. Among the adopted ML approaches, Graph Neural Networks have emerged as a promising approach for enhancing the effectiveness of defensive measures due to their ability to process and learn from heterogeneous cyber threat data. In this paper, we look into the application of GNNs in aiding to break each stage of one of the most renowned attack life cycles, the Lockheed Martin Cyber Kill Chain. We address each phase of CKC and discuss how GNNs contribute to preparing and preventing an attack from a defensive standpoint. Furthermore, We also discuss open research areas and further improvement scopes.
- [707] arXiv:2401.05683 (cross-list from cs.GT) [ pdf , other ]
-
Title: Deep Learning Meets Mechanism Design: Key Results and Some Novel ApplicationsSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)
Abstract: Mechanism design is essentially reverse engineering of games and involves inducing a game among strategic agents in a way that the induced game satisfies a set of desired properties in an equilibrium of the game. Desirable properties for a mechanism include incentive compatibility, individual rationality, welfare maximisation, revenue maximisation (or cost minimisation), fairness of allocation, etc. It is known from mechanism design theory that only certain strict subsets of these properties can be simultaneously satisfied exactly by any given mechanism. Often, the mechanisms required by real-world applications may need a subset of these properties that are theoretically impossible to be simultaneously satisfied. In such cases, a prominent recent approach is to use a deep learning based approach to learn a mechanism that approximately satisfies the required properties by minimizing a suitably defined loss function. In this paper, we present, from relevant literature, technical details of using a deep learning approach for mechanism design and provide an overview of key results in this topic. We demonstrate the power of this approach for three illustrative case studies: (a) efficient energy management in a vehicular network (b) resource allocation in a mobile network (c) designing a volume discount procurement auction for agricultural inputs. Section 6 concludes the paper.
- [708] arXiv:2401.05700 (cross-list from cs.CL) [ pdf , other ]
-
Title: R-BI: Regularized Batched Inputs enhance Incremental Decoding Framework for Low-Latency Simultaneous Speech TranslationAuthors: Jiaxin Guo , Zhanglin Wu , Zongyao Li , Hengchao Shang , Daimeng Wei , Xiaoyu Chen , Zhiqiang Rao , Shaojun Li , Hao YangComments: PreprintSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Incremental Decoding is an effective framework that enables the use of an offline model in a simultaneous setting without modifying the original model, making it suitable for Low-Latency Simultaneous Speech Translation. However, this framework may introduce errors when the system outputs from incomplete input. To reduce these output errors, several strategies such as Hold-$n$, LA-$n$, and SP-$n$ can be employed, but the hyper-parameter $n$ needs to be carefully selected for optimal performance. Moreover, these strategies are more suitable for end-to-end systems than cascade systems. In our paper, we propose a new adaptable and efficient policy named "Regularized Batched Inputs". Our method stands out by enhancing input diversity to mitigate output errors. We suggest particular regularization techniques for both end-to-end and cascade systems. We conducted experiments on IWSLT Simultaneous Speech Translation (SimulST) tasks, which demonstrate that our approach achieves low latency while maintaining no more than 2 BLEU points loss compared to offline systems. Furthermore, our SimulST systems attained several new state-of-the-art results in various language directions.
- [709] arXiv:2401.05730 (cross-list from cs.CV) [ pdf , other ]
-
Title: Enhancing Contrastive Learning with Efficient Combinatorial Positive PairingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learning speed and performance of any contrastive or non-contrastive method. We first analyze CMC's full-graph paradigm and empirically show that the learning speed of $K$-views can be increased by $_{K}\mathrm{C}_{2}$ times for small learning rate and early training. Then, we upgrade CMC's full-graph by mixing views created by a crop-only augmentation, adopting small-size views as in SwAV multi-crop, and modifying the negative sampling. The resulting multi-view strategy is called ECPP (Efficient Combinatorial Positive Pairing). We investigate the effectiveness of ECPP by applying it to SimCLR and assessing the linear evaluation performance for CIFAR-10 and ImageNet-100. For each benchmark, we achieve a state-of-the-art performance. In case of ImageNet-100, ECPP boosted SimCLR outperforms supervised learning.
- [710] arXiv:2401.05749 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way ParallelismSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT). Multi-way parallel, machine generated content not only dominates the translations in lower resource languages; it also constitutes a large fraction of the total web content in those languages. We also find evidence of a selection bias in the type of content which is translated into many languages, consistent with low quality English content being translated en masse into many lower resource languages, via MT. Our work raises serious concerns about training models such as multilingual large language models on both monolingual and bilingual data scraped from the web.
- [711] arXiv:2401.05772 (cross-list from cs.LG) [ pdf , other ]
-
Title: Knowledge Translation: A New Pathway for Model CompressionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Deep learning has witnessed significant advancements in recent years at the cost of increasing training, inference, and model storage overhead. While existing model compression methods strive to reduce the number of model parameters while maintaining high accuracy, they inevitably necessitate the re-training of the compressed model or impose architectural constraints. To overcome these limitations, this paper presents a novel framework, termed \textbf{K}nowledge \textbf{T}ranslation (KT), wherein a ``translation'' model is trained to receive the parameters of a larger model and generate compressed parameters. The concept of KT draws inspiration from language translation, which effectively employs neural networks to convert different languages, maintaining identical meaning. Accordingly, we explore the potential of neural networks to convert models of disparate sizes, while preserving their functionality. We propose a comprehensive framework for KT, introduce data augmentation strategies to enhance model performance despite restricted training data, and successfully demonstrate the feasibility of KT on the MNIST dataset. Code is available at \url{ this https URL }.
- [712] arXiv:2401.05778 (cross-list from cs.CL) [ pdf , other ]
-
Title: Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model SystemsAuthors: Tianyu Cui , Yanling Wang , Chuanpu Fu , Yong Xiao , Sijia Li , Xinhao Deng , Yunpeng Liu , Qinglin Zhang , Ziyi Qiu , Peiyang Li , Zhixing Tan , Junwu Xiong , Xinyu Kong , Zujie Wen , Ke Xu , Qi LiSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have strong capabilities in solving diverse natural language processing tasks. However, the safety and security issues of LLM systems have become the major obstacle to their widespread application. Many studies have extensively investigated risks in LLM systems and developed the corresponding mitigation strategies. Leading-edge enterprises such as OpenAI, Google, Meta, and Anthropic have also made lots of efforts on responsible LLMs. Therefore, there is a growing need to organize the existing studies and establish comprehensive taxonomies for the community. In this paper, we delve into four essential modules of an LLM system, including an input module for receiving prompts, a language model trained on extensive corpora, a toolchain module for development and deployment, and an output module for exporting LLM-generated content. Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies. Furthermore, we review prevalent benchmarks, aiming to facilitate the risk assessment of LLM systems. We hope that this paper can help LLM participants embrace a systematic perspective to build their responsible LLM systems.
- [713] arXiv:2401.05799 (cross-list from cs.CL) [ pdf , other ]
-
Title: Designing Heterogeneous LLM Agents for Financial Sentiment AnalysisAuthors: Frank XingComments: 15 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); General Finance (q-fin.GN)
Abstract: Large language models (LLMs) have drastically changed the possible ways to design intelligent systems, shifting the focuses from massive data acquisition and new modeling training to human alignment and strategical elicitation of the full potential of existing pre-trained models. This paradigm shift, however, is not fully realized in financial sentiment analysis (FSA), due to the discriminative nature of this task and a lack of prescriptive knowledge of how to leverage generative models in such a context. This study investigates the effectiveness of the new paradigm, i.e., using LLMs without fine-tuning for FSA. Rooted in Minsky's theory of mind and emotions, a design framework with heterogeneous LLM agents is proposed. The framework instantiates specialized agents using prior domain knowledge of the types of FSA errors and reasons on the aggregated agent discussions. Comprehensive evaluation on FSA datasets show that the framework yields better accuracies, especially when the discussions are substantial. This study contributes to the design foundations and paves new avenues for LLMs-based FSA. Implications on business and management are also discussed.
- [714] arXiv:2401.05800 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Spatiotemporal Process for Multivariate Time Series Anomaly Detection with Missing ValuesAuthors: Yu Zheng , Huan Yee Koh , Ming Jin , Lianhua Chi , Haishuai Wang , Khoa T. Phan , Yi-Ping Phoebe Chen , Shirui Pan , Wei XiangComments: Accepted by Information FusionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The detection of anomalies in multivariate time series data is crucial for various practical applications, including smart power grids, traffic flow forecasting, and industrial process control. However, real-world time series data is usually not well-structured, posting significant challenges to existing approaches: (1) The existence of missing values in multivariate time series data along variable and time dimensions hinders the effective modeling of interwoven spatial and temporal dependencies, resulting in important patterns being overlooked during model training; (2) Anomaly scoring with irregularly-sampled observations is less explored, making it difficult to use existing detectors for multivariate series without fully-observed values. In this work, we introduce a novel framework called GST-Pro, which utilizes a graph spatiotemporal process and anomaly scorer to tackle the aforementioned challenges in detecting anomalies on irregularly-sampled multivariate time series. Our approach comprises two main components. First, we propose a graph spatiotemporal process based on neural controlled differential equations. This process enables effective modeling of multivariate time series from both spatial and temporal perspectives, even when the data contains missing values. Second, we present a novel distribution-based anomaly scoring mechanism that alleviates the reliance on complete uniform observations. By analyzing the predictions of the graph spatiotemporal process, our approach allows anomalies to be easily detected. Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods, regardless of whether there are missing values present in the data. Our code is available: this https URL .
- [715] arXiv:2401.05811 (cross-list from cs.CL) [ pdf , other ]
-
Title: Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource LanguagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on large language models (LLMs). One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a straightforward approach to the first challenge. However, MTInstruct is limited by weak cross-lingual signals inherent in the second challenge. AlignInstruct emphasizes cross-lingual supervision via a cross-lingual discriminator built using statistical word alignments. Our results based on fine-tuning the BLOOMZ models (1b1, 3b, and 7b1) in up to 24 unseen languages showed that: (1) LLMs can effectively translate unseen languages using MTInstruct; (2) AlignInstruct led to consistent improvements in translation quality across 48 translation directions involving English; (3) Discriminator-based instructions outperformed their generative counterparts as cross-lingual instructions; (4) AlignInstruct improved performance in 30 zero-shot directions.
- [716] arXiv:2401.05827 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Hallucination Benchmark in Medical Visual Question AnsweringComments: Accepted to ICLR 2024 Tiny Papers(Notable)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: The recent success of large language and vision models (LLVMs) on vision question answering (VQA), particularly their applications in medicine (Med-VQA), has shown a great potential of realizing effective visual assistants for healthcare. However, these models are not extensively tested on the hallucination phenomenon in clinical settings. Here, we created a hallucination benchmark of medical images paired with question-answer sets and conducted a comprehensive evaluation of the state-of-the-art models. The study provides an in-depth analysis of current models' limitations and reveals the effectiveness of various prompting strategies.
- [717] arXiv:2401.05831 (cross-list from cs.LG) [ pdf , other ]
-
Title: Silhouette Aggregation: From Micro to MacroSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Silhouette coefficient is an established internal clustering evaluation measure that produces a score per data point, assessing the quality of its clustering assignment. To assess the quality of the clustering of the whole dataset, the scores of all the points in the dataset are either (micro) averaged into a single value or averaged at the cluster level and then (macro) averaged. As we illustrate in this work, by using a synthetic example, the micro-averaging strategy is sensitive both to cluster imbalance and outliers (background noise) while macro-averaging is far more robust to both. Furthermore, the latter allows cluster-balanced sampling which yields robust computation of the silhouette score. By conducting an experimental study on eight real-world datasets, estimating the ground truth number of clusters, we show that both coefficients, micro and macro, should be considered.
- [718] arXiv:2401.05840 (cross-list from cs.HC) [ pdf , other ]
-
Title: Decoding AI's Nudge: A Unified Framework to Predict Human Behavior in AI-assisted Decision MakingComments: AAAI 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: With the rapid development of AI-based decision aids, different forms of AI assistance have been increasingly integrated into the human decision making processes. To best support humans in decision making, it is essential to quantitatively understand how diverse forms of AI assistance influence humans' decision making behavior. To this end, much of the current research focuses on the end-to-end prediction of human behavior using ``black-box'' models, often lacking interpretations of the nuanced ways in which AI assistance impacts the human decision making process. Meanwhile, methods that prioritize the interpretability of human behavior predictions are often tailored for one specific form of AI assistance, making adaptations to other forms of assistance difficult. In this paper, we propose a computational framework that can provide an interpretable characterization of the influence of different forms of AI assistance on decision makers in AI-assisted decision making. By conceptualizing AI assistance as the ``{\em nudge}'' in human decision making processes, our approach centers around modelling how different forms of AI assistance modify humans' strategy in weighing different information in making their decisions. Evaluations on behavior data collected from real human decision makers show that the proposed framework outperforms various baselines in accurately predicting human behavior in AI-assisted decision making. Based on the proposed framework, we further provide insights into how individuals with different cognitive styles are nudged by AI assistance differently.
- [719] arXiv:2401.05849 (cross-list from cs.LG) [ pdf , other ]
-
Title: Inferring Intentions to Speak Using Accelerometer Data In-the-WildSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Humans have good natural intuition to recognize when another person has something to say. It would be interesting if an AI can also recognize intentions to speak. Especially in scenarios when an AI is guiding a group discussion, this can be a useful skill. This work studies the inference of successful and unsuccessful intentions to speak from accelerometer data. This is chosen because it is privacy-preserving and feasible for in-the-wild settings since it can be placed in a smart badge. Data from a real-life social networking event is used to train a machine-learning model that aims to infer intentions to speak. A subset of unsuccessful intention-to-speak cases in the data is annotated. The model is trained on the successful intentions to speak and evaluated on both the successful and unsuccessful cases. In conclusion, there is useful information in accelerometer data, but not enough to reliably capture intentions to speak. For example, posture shifts are correlated with intentions to speak, but people also often shift posture without having an intention to speak, or have an intention to speak without shifting their posture. More modalities are likely needed to reliably infer intentions to speak.
- [720] arXiv:2401.05870 (cross-list from cs.CV) [ pdf , other ]
-
Title: HiCAST: Highly Customized Arbitrary Style Transfer with Adapter Enhanced Diffusion ModelsAuthors: Hanzhang Wang , Haoran Wang , Jinze Yang , Zhongrui Yu , Zeke Xie , Lei Tian , Xinyan Xiao , Junjun Jiang , Xianming Liu , Mingming SunSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The goal of Arbitrary Style Transfer (AST) is injecting the artistic features of a style reference into a given image/video. Existing methods usually focus on pursuing the balance between style and content, whereas ignoring the significant demand for flexible and customized stylization results and thereby limiting their practical application. To address this critical issue, a novel AST approach namely HiCAST is proposed, which is capable of explicitly customizing the stylization results according to various source of semantic clues. In the specific, our model is constructed based on Latent Diffusion Model (LDM) and elaborately designed to absorb content and style instance as conditions of LDM. It is characterized by introducing of \textit{Style Adapter}, which allows user to flexibly manipulate the output results by aligning multi-level style information and intrinsic knowledge in LDM. Lastly, we further extend our model to perform video AST. A novel learning objective is leveraged for video diffusion model training, which significantly improve cross-frame temporal consistency in the premise of maintaining stylization strength. Qualitative and quantitative comparisons as well as comprehensive user studies demonstrate that our HiCAST outperforms the existing SoTA methods in generating visually plausible stylization results.
- [721] arXiv:2401.05895 (cross-list from cs.LG) [ pdf , other ]
-
Title: Binary Linear Tree Commitment-based Ownership Protection for Distributed Machine LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Distributed machine learning enables parallel training of extensive datasets by delegating computing tasks across multiple workers. Despite the cost reduction benefits of distributed machine learning, the dissemination of final model weights often leads to potential conflicts over model ownership as workers struggle to substantiate their involvement in the training computation. To address the above ownership issues and prevent accidental failures and malicious attacks, verifying the computational integrity and effectiveness of workers becomes particularly crucial in distributed machine learning. In this paper, we proposed a novel binary linear tree commitment-based ownership protection model to ensure computational integrity with limited overhead and concise proof. Due to the frequent updates of parameters during training, our commitment scheme introduces a maintainable tree structure to reduce the costs of updating proofs. Distinguished from SNARK-based verifiable computation, our model achieves efficient proof aggregation by leveraging inner product arguments. Furthermore, proofs of model weights are watermarked by worker identity keys to prevent commitments from being forged or duplicated. The performance analysis and comparison with SNARK-based hash commitments validate the efficacy of our model in preserving computational integrity within distributed machine learning.
- [722] arXiv:2401.05914 (cross-list from cs.CL) [ pdf , other ]
-
Title: How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational QuizzesComments: 8 pages, 8 figures. Accepted to the main track of the EAAI-24: The 14th Symposium on Educational Advances in Artificial IntelligenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Question generation (QG) is a natural language processing task with an abundance of potential benefits and use cases in the educational domain. In order for this potential to be realized, QG systems must be designed and validated with pedagogical needs in mind. However, little research has assessed or designed QG approaches with the input from real teachers or students. This paper applies a large language model-based QG approach where questions are generated with learning goals derived from Bloom's taxonomy. The automatically generated questions are used in multiple experiments designed to assess how teachers use them in practice. The results demonstrate that teachers prefer to write quizzes with automatically generated questions, and that such quizzes have no loss in quality compared to handwritten versions. Further, several metrics indicate that automatically generated questions can even improve the quality of the quizzes created, showing the promise for large scale use of QG in the classroom setting.
- [723] arXiv:2401.05925 (cross-list from cs.CV) [ pdf , other ]
-
Title: CoSSegGaussians: Compact and Swift Scene Segmenting 3D Gaussians with Dual Feature FusionComments: 9 pages, 8 figures, correct writing detailsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We propose Compact and Swift Segmenting 3D Gaussians(CoSSegGaussians), a method for compact 3D-consistent scene segmentation at fast rendering speed with only RGB images input. Previous NeRF-based segmentation methods have relied on time-consuming neural scene optimization. While recent 3D Gaussian Splatting has notably improved speed, existing Gaussian-based segmentation methods struggle to produce compact masks, especially in zero-shot segmentation. This issue probably stems from their straightforward assignment of learnable parameters to each Gaussian, resulting in a lack of robustness against cross-view inconsistent 2D machine-generated labels. Our method aims to address this problem by employing Dual Feature Fusion Network as Gaussians' segmentation field. Specifically, we first optimize 3D Gaussians under RGB supervision. After Gaussian Locating, DINO features extracted from images are applied through explicit unprojection, which are further incorporated with spatial features from the efficient point cloud processing network. Feature aggregation is utilized to fuse them in a global-to-local strategy for compact segmentation features. Experimental results show that our model outperforms baselines on both semantic and panoptic zero-shot segmentation task, meanwhile consumes less than 10% inference time compared to NeRF-based methods. Code and more results will be available at this https URL
- [724] arXiv:2401.05930 (cross-list from cs.CL) [ pdf , other ]
-
Title: SH2: Self-Highlighted Hesitation Helps You Decode More TruthfullySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) demonstrate great performance in text generation. However, LLMs are still suffering from hallucinations. In this work, we propose an inference-time method, Self-Highlighted Hesitation (SH2), to help LLMs decode more truthfully. SH2 is based on a simple fact rooted in information theory that for an LLM, the tokens predicted with lower probabilities are prone to be more informative than others. Our analysis shows that the tokens assigned with lower probabilities by an LLM are more likely to be closely related to factual information, such as nouns, proper nouns, and adjectives. Therefore, we propose to ''highlight'' the factual information by selecting the tokens with the lowest probabilities and concatenating them to the original context, thus forcing the model to repeatedly read and hesitate on these tokens before generation. During decoding, we also adopt contrastive decoding to emphasize the difference in the output probabilities brought by the hesitation. Experimental results demonstrate that our SH2, requiring no additional data or models, can effectively help LLMs elicit factual knowledge and distinguish hallucinated contexts. Significant and consistent improvements are achieved by SH2 for LLaMA-7b, LLaMA2-7b and Mistral-7b on multiple hallucination tasks.
- [725] arXiv:2401.05932 (cross-list from cs.CE) [ pdf , other ]
-
Title: DiffDA: a Diffusion model for weather-scale Data AssimilationSubjects: Computational Engineering, Finance, and Science (cs.CE) ; Artificial Intelligence (cs.AI)
Abstract: The generation of initial conditions via accurate data assimilation is crucial for weather forecasting and climate modeling. We propose DiffDA as a denoising diffusion model capable of assimilating atmospheric variables using predicted states and sparse observations. Acknowledging the similarity between a weather forecast model and a denoising diffusion model dedicated to weather applications, we adapt the pretrained GraphCast neural network as the backbone of the diffusion model. Through experiments based on simulated observations from the ERA5 reanalysis dataset, our method can produce assimilated global atmospheric data consistent with observations at 0.25 deg (~30km) resolution globally. This marks the highest resolution achieved by ML data assimilation models. The experiments also show that the initial conditions assimilated from sparse observations (less than 0.77% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5. This enables the application of the method to real-world applications, such as creating reanalysis datasets with autoregressive data assimilation.
- [726] arXiv:2401.05939 (cross-list from cs.IR) [ pdf , other ]
-
Title: DREQ: Document Re-Ranking Using Entity-based Query UnderstandingComments: To be presented as a full paper at ECIR 2024 in Glasgpow, UKSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document's representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a "hybrid" representation of the document. We learn a relevance score for the document using this hybrid representation. Using four large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach.
- [727] arXiv:2401.05940 (cross-list from cs.SE) [ pdf , other ]
-
Title: Mutation-based Consistency Testing for Evaluating the Code Understanding Capability of LLMsComments: This is an author-preprint. The published version will be included in the proceedings of CAIN 2024 (co-located with ICSE 2024)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have shown remarkable capabilities in processing both natural and programming languages, which have enabled various applications in software engineering, such as requirement engineering, code generation, and software testing. However, existing code generation benchmarks do not necessarily assess the code understanding performance of LLMs, especially for the subtle inconsistencies that may arise between code and its semantics described in natural language.
In this paper, we propose a novel method to systematically assess the code understanding performance of LLMs, particularly focusing on subtle differences between code and its descriptions, by introducing code mutations to existing code generation datasets. Code mutations are small changes that alter the semantics of the original code, creating a mismatch with the natural language description. We apply different types of code mutations, such as operator replacement and statement deletion, to generate inconsistent code-description pairs. We then use these pairs to test the ability of LLMs to correctly detect the inconsistencies.
We propose a new LLM testing method, called Mutation-based Consistency Testing (MCT), and conduct a case study on the two popular LLMs, GPT-3.5 and GPT-4, using the state-of-the-art code generation benchmark, HumanEval-X, which consists of six programming languages (Python, C++, Java, Go, JavaScript, and Rust). We compare the performance of the LLMs across different types of code mutations and programming languages and analyze the results. We find that the LLMs show significant variation in their code understanding performance and that they have different strengths and weaknesses depending on the mutation type and language. - [728] arXiv:2401.05946 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed EnvironmentsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Despite their stellar performance on a wide range of tasks, including in-context tasks only revealed during inference, vanilla transformers and variants trained for next-token predictions (a) do not learn an explicit world model of their environment which can be flexibly queried and (b) cannot be used for planning or navigation. In this paper, we consider partially observed environments (POEs), where an agent receives perceptually aliased observations as it navigates, which makes path planning hard. We introduce a transformer with (multiple) discrete bottleneck(s), TDB, whose latent codes learn a compressed representation of the history of observations and actions. After training a TDB to predict the future observation(s) given the history, we extract interpretable cognitive maps of the environment from its active bottleneck(s) indices. These maps are then paired with an external solver to solve (constrained) path planning problems. First, we show that a TDB trained on POEs (a) retains the near perfect predictive performance of a vanilla transformer or an LSTM while (b) solving shortest path problems exponentially faster. Second, a TDB extracts interpretable representations from text datasets, while reaching higher in-context accuracy than vanilla sequence models. Finally, in new POEs, a TDB (a) reaches near-perfect in-context accuracy, (b) learns accurate in-context cognitive maps (c) solves in-context path planning problems.
- [729] arXiv:2401.05949 (cross-list from cs.CL) [ pdf , other ]
-
Title: Universal Vulnerabilities in Large Language Models: Backdoor Attacks for In-context LearningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite being widely applied, in-context learning is vulnerable to malicious attacks. In this work, we raise security concerns regarding this paradigm. Our studies demonstrate that an attacker can manipulate the behavior of large language models by poisoning the demonstration context, without the need for fine-tuning the model. Specifically, we design a new backdoor attack method, named ICLAttack, to target large language models based on in-context learning. Our method encompasses two types of attacks: poisoning demonstration examples and poisoning demonstration prompts, which can make models behave in alignment with predefined intentions. ICLAttack does not require additional fine-tuning to implant a backdoor, thus preserving the model's generality. Furthermore, the poisoned examples are correctly labeled, enhancing the natural stealth of our attack method. Extensive experimental results across several language models, ranging in size from 1.3B to 180B parameters, demonstrate the effectiveness of our attack method, exemplified by a high average attack success rate of 95.0% across the three datasets on OPT models.
- [730] arXiv:2401.05964 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An attempt to generate new bridge types from latent space of PixelCNNAuthors: Hongjun ZhangComments: 7 pages, 8 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Try to generate new bridge types using generative artificial intelligence technology. Using symmetric structured image dataset of three-span beam bridge, arch bridge, cable-stayed bridge and suspension bridge , based on Python programming language, TensorFlow and Keras deep learning platform framework , PixelCNN is constructed and trained. The model can capture the statistical structure of the images and calculate the probability distribution of the next pixel when the previous pixels are given. From the obtained latent space sampling, new bridge types different from the training dataset can be generated. PixelCNN can organically combine different structural components on the basis of human original bridge types, creating new bridge types that have a certain degree of human original ability. Autoregressive models cannot understand the meaning of the sequence, while multimodal models combine regression and autoregressive models to understand the sequence. Multimodal models should be the way to achieve artificial general intelligence in the future.
- [731] arXiv:2401.05969 (cross-list from cs.LG) [ pdf , other ]
-
Title: Spatial-Aware Deep Reinforcement Learning for the Traveling Officer ProblemComments: SIAM SDM 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The traveling officer problem (TOP) is a challenging stochastic optimization task. In this problem, a parking officer is guided through a city equipped with parking sensors to fine as many parking offenders as possible. A major challenge in TOP is the dynamic nature of parking offenses, which randomly appear and disappear after some time, regardless of whether they have been fined. Thus, solutions need to dynamically adjust to currently fineable parking offenses while also planning ahead to increase the likelihood that the officer arrives during the offense taking place. Though various solutions exist, these methods often struggle to take the implications of actions on the ability to fine future parking violations into account. This paper proposes SATOP, a novel spatial-aware deep reinforcement learning approach for TOP. Our novel state encoder creates a representation of each action, leveraging the spatial relationships between parking spots, the agent, and the action. Furthermore, we propose a novel message-passing module for learning future inter-action correlations in the given environment. Thus, the agent can estimate the potential to fine further parking violations after executing an action. We evaluate our method using an environment based on real-world data from Melbourne. Our results show that SATOP consistently outperforms state-of-the-art TOP agents and is able to fine up to 22% more parking offenses.
- [732] arXiv:2401.05975 (cross-list from cs.IR) [ pdf , other ]
-
Title: End-to-end Learnable Clustering for Intent Learning in RecommendationAuthors: Yue Liu , Shihao Zhu , Jun Xia , Yingwei Ma , Jian Ma , Wenliang Zhong , Xinwang Liu , Guannan Zhang , Kejun ZhangComments: 24 pagesSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Intent learning, which aims to learn users' intents for user understanding and item recommendation, has become a hot research spot in recent years. However, the existing methods suffer from complex and cumbersome alternating optimization, limiting the performance and scalability. To this end, we propose a novel intent learning method termed \underline{ELCRec}, by unifying behavior representation learning into an \underline{E}nd-to-end \underline{L}earnable \underline{C}lustering framework, for effective and efficient \underline{Rec}ommendation. Concretely, we encode users' behavior sequences and initialize the cluster centers (latent intents) as learnable neurons. Then, we design a novel learnable clustering module to separate different cluster centers, thus decoupling users' complex intents. Meanwhile, it guides the network to learn intents from behaviors by forcing behavior embeddings close to cluster centers. This allows simultaneous optimization of recommendation and clustering via mini-batch data. Moreover, we propose intent-assisted contrastive learning by using cluster centers as self-supervision signals, further enhancing mutual promotion. Both experimental results and theoretical analyses demonstrate the superiority of ELCRec from six perspectives. Compared to the runner-up, ELCRec improves NDCG@5 by 8.9\% and reduces computational costs by 22.5\% on Beauty dataset. Furthermore, due to the scalability and universal applicability, we deploy this method on the industrial recommendation system with 130 million page views and achieve promising results.
- [733] arXiv:2401.05998 (cross-list from cs.CL) [ pdf , other ]
-
Title: Combating Adversarial Attacks with Multi-Agent DebateSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While state-of-the-art language models have achieved impressive results, they remain susceptible to inference-time adversarial attacks, such as adversarial prompts generated by red teams arXiv:2209.07858 . One approach proposed to improve the general quality of language model generations is multi-agent debate, where language models self-evaluate through discussion and feedback arXiv:2305.14325 . We implement multi-agent debate between current state-of-the-art language models and evaluate models' susceptibility to red team attacks in both single- and multi-agent settings. We find that multi-agent debate can reduce model toxicity when jailbroken or less capable models are forced to debate with non-jailbroken or more capable models. We also find marginal improvements through the general usage of multi-agent interactions. We further perform adversarial prompt content classification via embedding clustering, and analyze the susceptibility of different models to different types of attack topics.
- [734] arXiv:2401.06013 (cross-list from cs.CV) [ pdf , other ]
-
Title: Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic SurgeryComments: Accepted by IPCAI 2024 (IJCAR Special Issue)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. Results: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. Conclusion: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly. Code is available at this https URL .
- [735] arXiv:2401.06048 (cross-list from cs.SI) [ pdf , other ]
-
Title: On the Power of Graph Neural Networks and Feature Augmentation Strategies to Classify Social NetworksComments: 12 pages, 3 figures, 4 tables, The code for the experiments, together with the datasets used, is available from, this https URL walidgeuttala/Synthetic-Benchmark-for-Graph-ClassificationSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper studies four Graph Neural Network architectures (GNNs) for a graph classification task on a synthetic dataset created using classic generative models of Network Science. Since the synthetic networks do not contain (node or edge) features, five different augmentation strategies (artificial feature types) are applied to nodes. All combinations of the 4 GNNs (GCN with Hierarchical and Global aggregation, GIN and GATv2) and the 5 feature types (constant 1, noise, degree, normalized degree and ID -- a vector of the number of cycles of various lengths) are studied and their performances compared as a function of the hidden dimension of artificial neural networks used in the GNNs. The generalisation ability of these models is also analysed using a second synthetic network dataset (containing networks of different sizes).Our results point towards the balanced importance of the computational power of the GNN architecture and the the information level provided by the artificial features. GNN architectures with higher computational power, like GIN and GATv2, perform well for most augmentation strategies. On the other hand, artificial features with higher information content, like ID or degree, not only consistently outperform other augmentation strategies, but can also help GNN architectures with lower computational power to achieve good performance.
- [736] arXiv:2401.06059 (cross-list from cs.CL) [ pdf , other ]
-
Title: Investigating Data Contamination for Pre-training Language ModelsAuthors: Minhao Jiang , Ken Ziyu Liu , Ming Zhong , Rylan Schaeffer , Siru Ouyang , Jiawei Han , Sanmi KoyejoComments: 16 pages, 5 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Language models pre-trained on web-scale corpora demonstrate impressive capabilities on diverse downstream tasks. However, there is increasing concern whether such capabilities might arise from evaluation datasets being included in the pre-training corpus -- a phenomenon known as \textit{data contamination} -- in a manner that artificially increases performance. There has been little understanding of how this potential contamination might influence LMs' performance on downstream tasks. In this paper, we explore the impact of data contamination at the pre-training stage by pre-training a series of GPT-2 models \textit{from scratch}. We highlight the effect of both text contamination (\textit{i.e.}\ input text of the evaluation samples) and ground-truth contamination (\textit{i.e.}\ the prompts asked on the input and the desired outputs) from evaluation data. We also investigate the effects of repeating contamination for various downstream tasks. Additionally, we examine the prevailing n-gram-based definitions of contamination within current LLM reports, pinpointing their limitations and inadequacy. Our findings offer new insights into data contamination's effects on language model capabilities and underscore the need for independent, comprehensive contamination assessments in LLM studies.
- [737] arXiv:2401.06086 (cross-list from cs.LG) [ pdf , other ]
-
Title: XGBoost Learning of Dynamic Wager Placement for In-Play Betting on an Agent-Based Model of a Sports Betting ExchangeComments: Accepted for presentation/publication at the 16th International Conference on Agents and Artificial Intelligence (ICAART2024); Rome, Italy, 24-26 February 2024. 13 pages; 18 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Multiagent Systems (cs.MA)
Abstract: We present first results from the use of XGBoost, a highly effective machine learning (ML) method, within the Bristol Betting Exchange (BBE), an open-source agent-based model (ABM) designed to simulate a contemporary sports-betting exchange with in-play betting during track-racing events such as horse races. We use the BBE ABM and its array of minimally-simple bettor-agents as a synthetic data generator which feeds into our XGBoost ML system, with the intention that XGBoost discovers profitable dynamic betting strategies by learning from the more profitable bets made by the BBE bettor-agents. After this XGBoost training, which results in one or more decision trees, a bettor-agent with a betting strategy determined by the XGBoost-learned decision tree(s) is added to the BBE ABM and made to bet on a sequence of races under various conditions and betting-market scenarios, with profitability serving as the primary metric of comparison and evaluation. Our initial findings presented here show that XGBoost trained in this way can indeed learn profitable betting strategies, and can generalise to learn strategies that outperform each of the set of strategies used for creation of the training data. To foster further research and enhancements, the complete version of our extended BBE, including the XGBoost integration, has been made freely available as an open-source release on GitHub.
- [738] arXiv:2401.06088 (cross-list from cs.CL) [ pdf , other ]
-
Title: Autocompletion of Chief Complaints in the Electronic Health Records using Large Language ModelsComments: IEEE BigData 2023 - Sorrento, Italy. 10 Pages, 4 Figures, 5 TablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The Chief Complaint (CC) is a crucial component of a patient's medical record as it describes the main reason or concern for seeking medical care. It provides critical information for healthcare providers to make informed decisions about patient care. However, documenting CCs can be time-consuming for healthcare providers, especially in busy emergency departments. To address this issue, an autocompletion tool that suggests accurate and well-formatted phrases or sentences for clinical notes can be a valuable resource for triage nurses. In this study, we utilized text generation techniques to develop machine learning models using CC data. In our proposed work, we train a Long Short-Term Memory (LSTM) model and fine-tune three different variants of Biomedical Generative Pretrained Transformers (BioGPT), namely microsoft/biogpt, microsoft/BioGPT-Large, and microsoft/BioGPT-Large-PubMedQA. Additionally, we tune a prompt by incorporating exemplar CC sentences, utilizing the OpenAI API of GPT-4. We evaluate the models' performance based on the perplexity score, modified BERTScore, and cosine similarity score. The results show that BioGPT-Large exhibits superior performance compared to the other models. It consistently achieves a remarkably low perplexity score of 1.65 when generating CC, whereas the baseline LSTM model achieves the best perplexity score of 170. Further, we evaluate and assess the proposed models' performance and the outcome of GPT-4.0. Our study demonstrates that utilizing LLMs such as BioGPT, leads to the development of an effective autocompletion tool for generating CC documentation in healthcare settings.
- [739] arXiv:2401.06102 (cross-list from cs.CL) [ pdf , other ]
-
Title: Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Inspecting the information encoded in hidden representations of large language models (LLMs) can explain models' behavior and verify their alignment with human values. Given the capabilities of LLMs in generating human-understandable text, we propose leveraging the model itself to explain its internal representations in natural language. We introduce a framework called Patchscopes and show how it can be used to answer a wide range of questions about an LLM's computation. We show that prior interpretability methods based on projecting representations into the vocabulary space and intervening on the LLM computation can be viewed as instances of this framework. Moreover, several of their shortcomings such as failure in inspecting early layers or lack of expressivity can be mitigated by Patchscopes. Beyond unifying prior inspection techniques, Patchscopes also opens up new possibilities such as using a more capable model to explain the representations of a smaller model, and unlocks new applications such as self-correction in multi-hop reasoning.
- [740] arXiv:2401.06122 (cross-list from cs.LG) [ pdf , other ]
-
Title: Manipulating Feature Visualizations with Gradient SlingshotsAuthors: Dilyara Bareeva , Marina M.-C. Höhne , Alexander Warnecke , Lukas Pirch , Klaus-Robert Müller , Konrad Rieck , Kirill BykovSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Deep Neural Networks (DNNs) are capable of learning complex and versatile representations, however, the semantic nature of the learned concepts remains unknown. A common method used to explain the concepts learned by DNNs is Activation Maximization (AM), which generates a synthetic input signal that maximally activates a particular neuron in the network. In this paper, we investigate the vulnerability of this approach to adversarial model manipulations and introduce a novel method for manipulating feature visualization without altering the model architecture or significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of specific neurons by masking the original explanations of neurons with chosen target explanations during model auditing. As a remedy, we propose a protective measure against such manipulations and provide quantitative evidence which substantiates our findings.
- [741] arXiv:2401.06127 (cross-list from cs.CV) [ pdf , other ]
-
Title: E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image TranslationAuthors: Yifan Gong , Zheng Zhan , Qing Jin , Yanyu Li , Yerlan Idelbayev , Xian Liu , Andrey Zharkov , Kfir Aberman , Sergey Tulyakov , Yanzhi Wang , Jian RenSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models, such as Stable Diffusion, to generate paired datasets used for training generative adversarial networks (GANs). This approach notably alleviates the stringent requirements typically imposed by high-end commercial GPUs for performing image editing with diffusion models. However, unlike text-to-image diffusion models, each distilled GAN is specialized for a specific image editing task, necessitating costly training efforts to obtain models for various concepts. In this work, we introduce and address a novel research direction: can the process of distilling GANs from diffusion models be made significantly more efficient? To achieve this goal, we propose a series of innovative techniques. First, we construct a base GAN model with generalized features, adaptable to different concepts through fine-tuning, eliminating the need for training from scratch. Second, we identify crucial layers within the base GAN model and employ Low-Rank Adaptation (LoRA) with a simple yet effective rank search process, rather than fine-tuning the entire base model. Third, we investigate the minimal amount of data necessary for fine-tuning, further reducing the overall training time. Extensive experiments show that we can efficiently empower GANs with the ability to perform real-time high-quality image editing on mobile devices with remarkable reduced training cost and storage for each concept.
- [742] arXiv:2401.06137 (cross-list from cs.NE) [ pdf , other ]
-
Title: QuasiNet: a neural network with trainable product layersComments: This work was funded by the Horizon-Widera-2021 European Twinning project TERAIS G.A. n. 101079338. Presented at International Conference on Artificial Neural Networks (ICANN) 2023. Accepted:1.7.2023 Published:26.9.2023. Code: this https URLSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Classical neural networks achieve only limited convergence in hard problems such as XOR or parity when the number of hidden neurons is small. With the motivation to improve the success rate of neural networks in these problems, we propose a new neural network model inspired by existing neural network models with so called product neurons and a learning rule derived from classical error backpropagation, which elegantly solves the problem of mutually exclusive situations. Unlike existing product neurons, which have weights that are preset and not adaptable, our product layers of neurons also do learn. We tested the model and compared its success rate to a classical multilayer perceptron in the aforementioned problems as well as in other hard problems such as the two spirals. Our results indicate that our model is clearly more successful than the classical MLP and has the potential to be used in many tasks and applications.
- [743] arXiv:2401.06143 (cross-list from cs.CV) [ pdf , other ]
-
Title: Redefining Recon: Bridging Gaps with UAVs, 360 degree Cameras, and Neural Radiance FieldsAuthors: Hartmut Surmann , Niklas Digakis , Jan-Nicklas Kremer , Julien Meine , Max Schulte , Niklas VoigtComments: 6 pages, published at IEEE International Symposium on Safety,Security,and Rescue Robotics SSRR2023 in FUKUSHIMA, November 13-15 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In the realm of digital situational awareness during disaster situations, accurate digital representations, like 3D models, play an indispensable role. To ensure the safety of rescue teams, robotic platforms are often deployed to generate these models. In this paper, we introduce an innovative approach that synergizes the capabilities of compact Unmaned Arial Vehicles (UAVs), smaller than 30 cm, equipped with 360 degree cameras and the advances of Neural Radiance Fields (NeRFs). A NeRF, a specialized neural network, can deduce a 3D representation of any scene using 2D images and then synthesize it from various angles upon request. This method is especially tailored for urban environments which have experienced significant destruction, where the structural integrity of buildings is compromised to the point of barring entry-commonly observed post-earthquakes and after severe fires. We have tested our approach through recent post-fire scenario, underlining the efficacy of NeRFs even in challenging outdoor environments characterized by water, snow, varying light conditions, and reflective surfaces.
- [744] arXiv:2401.06153 (cross-list from cs.NE) [ pdf , other ]
-
Title: Multi-Modal Optimization with k-Cluster Big Bang-Big Crunch AlgorithmComments: 17 pages, 7 figuresSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Multi-modal optimization is often encountered in engineering problems, especially when different and alternative solutions are sought. Evolutionary algorithms can efficiently tackle multi-modal optimization thanks to their features such as the concept of population, exploration/exploitation, and being suitable for parallel computation.
This paper introduces a multi-modal optimization version of the Big Bang-Big Crunch algorithm based on clustering, namely, k-BBBC. This algorithm guarantees a complete convergence of the entire population, retrieving on average the 99\% of local optima for a specific problem. Additionally, we introduce two post-processing methods to (i) identify the local optima in a set of retrieved solutions (i.e., a population), and (ii) quantify the number of correctly retrieved optima against the expected ones (i.e., success rate).
Our results show that k-BBBC performs well even with problems having a large number of optima (tested on 379 optima) and high dimensionality (tested on 32 decision variables). When compared to other multi-modal optimization methods, it outperforms them in terms of accuracy (in both search and objective space) and success rate (number of correctly retrieved optima) -- especially when elitism is applied. Lastly, we validated our proposed post-processing methods by comparing their success rate to the actual one. Results suggest that these methods can be used to evaluate the performance of a multi-modal optimization algorithm by correctly identifying optima and providing an indication of success -- without the need to know where the optima are located in the search space. - [745] arXiv:2401.06157 (cross-list from cs.CV) [ pdf , other ]
-
Title: UDEEP: Edge-based Computer Vision for In-Situ Underwater Crayfish and Plastic DetectionAuthors: Dennis Monari , Jack Larkin , Pedro Machado , Jordan J. Bird , Isibor Kennedy Ihianle , Salisu Wada Yahaya , Farhad Fassihi Tash , Md Mahmudul Hasan , Ahmad LotfiSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Invasive signal crayfish have a detrimental impact on ecosystems. They spread the fungal-type crayfish plague disease (Aphanomyces astaci) that is lethal to the native white clawed crayfish, the only native crayfish species in Britain. Invasive signal crayfish extensively burrow, causing habitat destruction, erosion of river banks and adverse changes in water quality, while also competing with native species for resources and leading to declines in native populations. Moreover, pollution exacerbates the vulnerability of White-clawed crayfish, with their populations declining by over 90% in certain English counties, making them highly susceptible to extinction. To safeguard aquatic ecosystems, it is imperative to address the challenges posed by invasive species and discarded plastics in the United Kingdom's river ecosystem's. The UDEEP platform can play a crucial role in environmental monitoring by performing on-the-fly classification of Signal crayfish and plastic debris while leveraging the efficacy of AI, IoT devices and the power of edge computing (i.e., NJN). By providing accurate data on the presence, spread and abundance of these species, the UDEEP platform can contribute to monitoring efforts and aid in mitigating the spread of invasive species.
- [746] arXiv:2401.06160 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Future-proofing Education: A Prototype for Simulating Oral Examinations Using Large Language ModelsAuthors: André NitzeComments: 6 pages, 3 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This study explores the impact of Large Language Models (LLMs) in higher education, focusing on an automated oral examination simulation using a prototype. The design considerations of the prototype are described, and the system is evaluated with a select group of educators and students. Technical and pedagogical observations are discussed. The prototype proved to be effective in simulating oral exams, providing personalized feedback, and streamlining educators' workloads. The promising results of the prototype show the potential for LLMs in democratizing education, inclusion of diverse student populations, and improvement of teaching quality and efficiency.
- [747] arXiv:2401.06161 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Trustworthy human-centric based Automated Decision-Making SystemsAuthors: Marcelino Cabrera , Carlos Cruz , Pavel Novoa-Hernández , David A. Pelta , José Luis VerdegayComments: 16 pages, 1 TableSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Automated Decision-Making Systems (ADS) have become pervasive across various fields, activities, and occupations, to enhance performance. However, this widespread adoption introduces potential risks, including the misuse of ADS. Such misuse may manifest when ADS is employed in situations where it is unnecessary or when essential requirements, conditions, and terms are overlooked, leading to unintended consequences. This research paper presents a thorough examination of the implications, distinctions, and ethical considerations associated with digitalization, digital transformation, and the utilization of ADS in contemporary society and future contexts. Emphasis is placed on the imperative need for regulation, transparency, and ethical conduct in the deployment of ADS.
- [748] arXiv:2401.06162 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: A debiasing technique for place-based algorithmic patrol managementAuthors: Alexander Einarsson (1), Simen Oestmo (2), Lester Wollman (2), Duncan Purves (3), Ryan Jenkins (4) ((1) Northwestern University (2) SoundThinking Inc. (3) University of Florida (4) California Polytechnic State University)Comments: 20 pages (91 Appendix pages), 6 figures (20 supplementary figures), 14 supplementary tablesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, there has been a revolution in data-driven policing. With that has come scrutiny on how bias in historical data affects algorithmic decision making. In this exploratory work, we introduce a debiasing technique for place-based algorithmic patrol management systems. We show that the technique efficiently eliminates racially biased features while retaining high accuracy in the models. Finally, we provide a lengthy list of potential future research in the realm of fairness and data-driven policing which this work uncovered.
- [749] arXiv:2401.06167 (cross-list from cs.CV) [ pdf , other ]
-
Title: Enhancing Multimodal Understanding with CLIP-Based Image-to-Text TransformationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The process of transforming input images into corresponding textual explanations stands as a crucial and complex endeavor within the domains of computer vision and natural language processing. In this paper, we propose an innovative ensemble approach that harnesses the capabilities of Contrastive Language-Image Pretraining models.
- [750] arXiv:2401.06168 (cross-list from cs.GT) [ pdf , other ]
-
Title: A Survey on Game Theory Optimal PokerSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI)
Abstract: Poker is in the family of imperfect information games unlike other games such as chess, connect four, etc which are perfect information game instead. While many perfect information games have been solved, no non-trivial imperfect information game has been solved to date. This makes poker a great test bed for Artificial Intelligence research. In this paper we firstly compare Game theory optimal poker to Exploitative poker. Secondly, we discuss the intricacies of abstraction techniques, betting models, and specific strategies employed by successful poker bots like Tartanian[1] and Pluribus[6]. Thirdly, we also explore 2-player vs multi-player games and the limitations that come when playing with more players. Finally, this paper discusses the role of machine learning and theoretical approaches in developing winning strategies and suggests future directions for this rapidly evolving field.
- [751] arXiv:2401.06171 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Harnessing Artificial Intelligence for Sustainable Agricultural Development in Africa: Opportunities, Challenges, and ImpactAuthors: Kinyua GikundaSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This paper explores the transformative potential of artificial intelligence (AI) in the context of sustainable agricultural development across diverse regions in Africa. Delving into opportunities, challenges, and impact, the study navigates through the dynamic landscape of AI applications in agriculture. Opportunities such as precision farming, crop monitoring, and climate-resilient practices are examined, alongside challenges related to technological infrastructure, data accessibility, and skill gaps. The article analyzes the impact of AI on smallholder farmers, supply chains, and inclusive growth. Ethical considerations and policy implications are also discussed, offering insights into responsible AI integration. By providing a nuanced understanding, this paper contributes to the ongoing discourse on leveraging AI for fostering sustainability in African agriculture.
- [752] arXiv:2401.06175 (cross-list from cs.SE) [ pdf , other ]
-
Title: MTAD: Tools and Benchmarks for Multivariate Time Series Anomaly DetectionComments: The code and datasets are available at this https URLSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Key Performance Indicators (KPIs) are essential time-series metrics for ensuring the reliability and stability of many software systems. They faithfully record runtime states to facilitate the understanding of anomalous system behaviors and provide informative clues for engineers to pinpoint the root causes. The unprecedented scale and complexity of modern software systems, however, make the volume of KPIs explode. Consequently, many traditional methods of KPI anomaly detection become impractical, which serves as a catalyst for the fast development of machine learning-based solutions in both academia and industry. However, there is currently a lack of rigorous comparison among these KPI anomaly detection methods, and re-implementation demands a non-trivial effort. Moreover, we observe that different works adopt independent evaluation processes with different metrics. Some of them may not fully reveal the capability of a model and some are creating an illusion of progress. To better understand the characteristics of different KPI anomaly detectors and address the evaluation issue, in this paper, we provide a comprehensive review and evaluation of twelve state-of-the-art methods, and propose a novel metric called salience. Particularly, the selected methods include five traditional machine learning-based methods and seven deep learning-based methods. These methods are evaluated with five multivariate KPI datasets that are publicly available. A unified toolkit with easy-to-use interfaces is also released. We report the benchmark results in terms of accuracy, salience, efficiency, and delay, which are of practical importance for industrial deployment. We believe our work can contribute as a basis for future academic research and industrial application.
- [753] arXiv:2401.06176 (cross-list from cs.LG) [ pdf , other ]
-
Title: GOODAT: Towards Test-time Graph Out-of-Distribution DetectionAuthors: Luzhi Wang , Dongxiao He , He Zhang , Yixin Liu , Wenjie Wang , Shirui Pan , Di Jin , Tat-Seng ChuaComments: 9 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts (in distribution, ID), they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD samples with GNNs, recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN. Despite their effectiveness, these methods come with heavy training resources and costs, as they need to optimize the GNN-based models on training data. Moreover, their reliance on modifying the original GNNs and accessing training data further restricts their universality. To this end, this paper introduces a method to detect Graph Out-of-Distribution At Test-time (namely GOODAT), a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture. With a lightweight graph masker, GOODAT can learn informative subgraphs from test samples, enabling the capture of distinct graph patterns between OOD and ID samples. To optimize the graph masker, we meticulously design three unsupervised objective functions based on the graph information bottleneck principle, motivating the masker to capture compact yet informative subgraphs for OOD detection. Comprehensive evaluations confirm that our GOODAT method outperforms state-of-the-art benchmarks across a variety of real-world datasets. The code is available at Github: this https URL
- [754] arXiv:2401.06178 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: AI Art is Theft: Labour, Extraction, and Exploitation, Or, On the Dangers of Stochastic PollocksAuthors: Trystan S. GoetzeComments: 25 pages. Under reviewSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Since the launch of applications such as DALL-E, Midjourney, and Stable Diffusion, generative artificial intelligence has been controversial as a tool for creating artwork. While some have presented longtermist worries about these technologies as harbingers of fully automated futures to come, more pressing is the impact of generative AI on creative labour in the present. Already, business leaders have begun replacing human artistic labour with AI-generated images. In response, the artistic community has launched a protest movement, which argues that AI image generation is a kind of theft. This paper analyzes, substantiates, and critiques these arguments, concluding that AI image generators involve an unethical kind of labour theft. If correct, many other AI applications also rely upon theft.
- [755] arXiv:2401.06194 (cross-list from cs.LG) [ pdf , other ]
-
Title: CrisisKAN: Knowledge-infused and Explainable Multimodal Attention Network for Crisis Event ClassificationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Pervasive use of social media has become the emerging source for real-time information (like images, text, or both) to identify various events. Despite the rapid growth of image and text-based event classification, the state-of-the-art (SOTA) models find it challenging to bridge the semantic gap between features of image and text modalities due to inconsistent encoding. Also, the black-box nature of models fails to explain the model's outcomes for building trust in high-stakes situations such as disasters, pandemic. Additionally, the word limit imposed on social media posts can potentially introduce bias towards specific events. To address these issues, we proposed CrisisKAN, a novel Knowledge-infused and Explainable Multimodal Attention Network that entails images and texts in conjunction with external knowledge from Wikipedia to classify crisis events. To enrich the context-specific understanding of textual information, we integrated Wikipedia knowledge using proposed wiki extraction algorithm. Along with this, a guided cross-attention module is implemented to fill the semantic gap in integrating visual and textual data. In order to ensure reliability, we employ a model-specific approach called Gradient-weighted Class Activation Mapping (Grad-CAM) that provides a robust explanation of the predictions of the proposed model. The comprehensive experiments conducted on the CrisisMMD dataset yield in-depth analysis across various crisis-specific tasks and settings. As a result, CrisisKAN outperforms existing SOTA methodologies and provides a novel view in the domain of explainable multimodal event classification.
- [756] arXiv:2401.06195 (cross-list from cs.ET) [ pdf , other ]
-
Title: NeuSpin: Design of a Reliable Edge Neuromorphic System Based on Spintronics for Green AISubjects: Emerging Technologies (cs.ET) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Internet of Things (IoT) and smart wearable devices for personalized healthcare will require storing and computing ever-increasing amounts of data. The key requirements for these devices are ultra-low-power, high-processing capabilities, autonomy at low cost, as well as reliability and accuracy to enable Green AI at the edge. Artificial Intelligence (AI) models, especially Bayesian Neural Networks (BayNNs) are resource-intensive and face challenges with traditional computing architectures due to the memory wall problem. Computing-in-Memory (CIM) with emerging resistive memories offers a solution by combining memory blocks and computing units for higher efficiency and lower power consumption. However, implementing BayNNs on CIM hardware, particularly with spintronic technologies, presents technical challenges due to variability and manufacturing defects. The NeuSPIN project aims to address these challenges through full-stack hardware and software co-design, developing novel algorithmic and circuit design approaches to enhance the performance, energy-efficiency and robustness of BayNNs on sprintronic-based CIM platforms.
- [757] arXiv:2401.06204 (cross-list from cs.LG) [ pdf , other ]
-
Title: An Exploratory Assessment of LLM's Potential Toward Flight Trajectory Reconstruction AnalysisComments: 6 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Large Language Models (LLMs) hold transformative potential in aviation, particularly in reconstructing flight trajectories. This paper investigates this potential, grounded in the notion that LLMs excel at processing sequential data and deciphering complex data structures. Utilizing the LLaMA 2 model, a pre-trained open-source LLM, the study focuses on reconstructing flight trajectories using Automatic Dependent Surveillance-Broadcast (ADS-B) data with irregularities inherent in real-world scenarios. The findings demonstrate the model's proficiency in filtering noise and estimating both linear and curved flight trajectories. However, the analysis also reveals challenges in managing longer data sequences, which may be attributed to the token length limitations of LLM models. The study's insights underscore the promise of LLMs in flight trajectory reconstruction and open new avenues for their broader application across the aviation and transportation sectors.
- [758] arXiv:2401.06210 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learning Unsupervised Semantic Document Representation for Fine-grained Aspect-based Sentiment AnalysisComments: International ACM SIGIR Conference 2019Journal-ref: SIGIR 2019: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Pages 1105 to 1108Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract: Document representation is the core of many NLP tasks on machine understanding. A general representation learned in an unsupervised manner reserves generality and can be used for various applications. In practice, sentiment analysis (SA) has been a challenging task that is regarded to be deeply semantic-related and is often used to assess general representations. Existing methods on unsupervised document representation learning can be separated into two families: sequential ones, which explicitly take the ordering of words into consideration, and non-sequential ones, which do not explicitly do so. However, both of them suffer from their own weaknesses. In this paper, we propose a model that overcomes difficulties encountered by both families of methods. Experiments show that our model outperforms state-of-the-art methods on popular SA datasets and a fine-grained aspect-based SA by a large margin.
- [759] arXiv:2401.06308 (cross-list from cs.NI) [ pdf , other ]
-
Title: A Semantic-Aware Multiple Access Scheme for Distributed, Dynamic 6G-Based ApplicationsComments: Accepted at the 2024 IEEE Wireless Communications and Networking Conference (WCNC 2024)Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: The emergence of the semantic-aware paradigm presents opportunities for innovative services, especially in the context of 6G-based applications. Although significant progress has been made in semantic extraction techniques, the incorporation of semantic information into resource allocation decision-making is still in its early stages, lacking consideration of the requirements and characteristics of future systems. In response, this paper introduces a novel formulation for the problem of multiple access to the wireless spectrum. It aims to optimize the utilization-fairness trade-off, using the $\alpha$-fairness metric, while accounting for user data correlation by introducing the concepts of self- and assisted throughputs. Initially, the problem is analyzed to identify its optimal solution. Subsequently, a Semantic-Aware Multi-Agent Double and Dueling Deep Q-Learning (SAMA-D3QL) technique is proposed. This method is grounded in Model-free Multi-Agent Deep Reinforcement Learning (MADRL), enabling the user equipment to autonomously make decisions regarding wireless spectrum access based solely on their local individual observations. The efficiency of the proposed technique is evaluated through two scenarios: single-channel and multi-channel. The findings illustrate that, across a spectrum of $\alpha$ values, association matrices, and channels, SAMA-D3QL consistently outperforms alternative approaches. This establishes it as a promising candidate for facilitating the realization of future federated, dynamically evolving applications.
- [760] arXiv:2401.06318 (cross-list from cs.LG) [ pdf , other ]
-
Title: Striking a Balance in Fairness for Dynamic Systems Through Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: While significant advancements have been made in the field of fair machine learning, the majority of studies focus on scenarios where the decision model operates on a static population. In this paper, we study fairness in dynamic systems where sequential decisions are made. Each decision may shift the underlying distribution of features or user behavior. We model the dynamic system through a Markov Decision Process (MDP). By acknowledging that traditional fairness notions and long-term fairness are distinct requirements that may not necessarily align with one another, we propose an algorithmic framework to integrate various fairness considerations with reinforcement learning using both pre-processing and in-processing approaches. Three case studies show that our method can strike a balance between traditional fairness notions, long-term fairness, and utility.
- [761] arXiv:2401.06340 (cross-list from cs.HC) [ pdf , other ]
-
Title: A Temporal-Spectral Fusion Transformer with Subject-specific Adapter for Enhancing RSVP-BCI DecodingComments: 19 pages, 10 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: The Rapid Serial Visual Presentation (RSVP)-based Brain-Computer Interface (BCI) is an efficient technology for target retrieval using electroencephalography (EEG) signals. The performance improvement of traditional decoding methods relies on a substantial amount of training data from new test subjects, which increases preparation time for BCI systems. Several studies introduce data from existing subjects to reduce the dependence of performance improvement on data from new subjects, but their optimization strategy based on adversarial learning with extensive data increases training time during the preparation procedure. Moreover, most previous methods only focus on the single-view information of EEG signals, but ignore the information from other views which may further improve performance. To enhance decoding performance while reducing preparation time, we propose a Temporal-Spectral fusion transformer with Subject-specific Adapter (TSformer-SA). Specifically, a cross-view interaction module is proposed to facilitate information transfer and extract common representations across two-view features extracted from EEG temporal signals and spectrogram images. Then, an attention-based fusion module fuses the features of two views to obtain comprehensive discriminative features for classification. Furthermore, a multi-view consistency loss is proposed to maximize the feature similarity between two views of the same EEG signal. Finally, we propose a subject-specific adapter to rapidly transfer the knowledge of the model trained on data from existing subjects to decode data from new subjects. Experimental results show that TSformer-SA significantly outperforms comparison methods and achieves outstanding performance with limited training data from new subjects. This facilitates efficient decoding and rapid deployment of BCI systems in practical use.
- [762] arXiv:2401.06370 (cross-list from cs.CV) [ pdf , other ]
-
Title: Graph Relation Distillation for Efficient Biomedical Instance SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Instance-aware embeddings predicted by deep neural networks have revolutionized biomedical instance segmentation, but its resource requirements are substantial. Knowledge distillation offers a solution by transferring distilled knowledge from heavy teacher networks to lightweight yet high-performance student networks. However, existing knowledge distillation methods struggle to extract knowledge for distinguishing instances and overlook global relation information. To address these challenges, we propose a graph relation distillation approach for efficient biomedical instance segmentation, which considers three essential types of knowledge: instance-level features, instance relations, and pixel-level boundaries. We introduce two graph distillation schemes deployed at both the intra-image level and the inter-image level: instance graph distillation (IGD) and affinity graph distillation (AGD). IGD constructs a graph representing instance features and relations, transferring these two types of knowledge by enforcing instance graph consistency. AGD constructs an affinity graph representing pixel relations to capture structured knowledge of instance boundaries, transferring boundary-related knowledge by ensuring pixel affinity consistency. Experimental results on a number of biomedical datasets validate the effectiveness of our approach, enabling student models with less than $ 1\%$ parameters and less than $10\%$ inference time while achieving promising performance compared to teacher models.
- [763] arXiv:2401.06373 (cross-list from cs.CL) [ pdf , other ]
-
Title: How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMsComments: 14 pages of the main text, qualitative examples of jailbreaks may be harmful in natureSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Most traditional AI safety research has approached AI models as machines and centered on algorithm-focused attacks developed by security experts. As large language models (LLMs) become increasingly common and competent, non-expert users can also impose risks during daily interactions. This paper introduces a new perspective to jailbreak LLMs as human-like communicators, to explore this overlooked intersection between everyday language interaction and AI safety. Specifically, we study how to persuade LLMs to jailbreak them. First, we propose a persuasion taxonomy derived from decades of social science research. Then, we apply the taxonomy to automatically generate interpretable persuasive adversarial prompts (PAP) to jailbreak LLMs. Results show that persuasion significantly increases the jailbreak performance across all risk categories: PAP consistently achieves an attack success rate of over $92\%$ on Llama 2-7b Chat, GPT-3.5, and GPT-4 in $10$ trials, surpassing recent algorithm-focused attacks. On the defense side, we explore various mechanisms against PAP and, found a significant gap in existing defenses, and advocate for more fundamental mitigation for highly interactive LLMs
- [764] arXiv:2401.06382 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: What should I say? -- Interacting with AI and Natural Language InterfacesAuthors: Mark AdkinsComments: 6 pages, 12 figures, 12 data tables, study data included in appendixSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: As Artificial Intelligence (AI) technology becomes more and more prevalent, it becomes increasingly important to explore how we as humans interact with AI. The Human-AI Interaction (HAI) sub-field has emerged from the Human-Computer Interaction (HCI) field and aims to examine this very notion. Many interaction patterns have been implemented without fully understanding the changes in required cognition as well as the cognitive science implications of using these alternative interfaces that aim to be more human-like in nature. Prior research suggests that theory of mind representations are crucial to successful and effortless communication, however very little is understood when it comes to how theory of mind representations are established when interacting with AI.
- [765] arXiv:2401.06394 (cross-list from cs.CL) [ pdf , other ]
-
Title: Adaptive Data Augmentation for Aspect Sentiment Quad PredictionComments: Accepted by ICASSP 2024, 5 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Aspect sentiment quad prediction (ASQP) aims to predict the quad sentiment elements for a given sentence, which is a critical task in the field of aspect-based sentiment analysis. However, the data imbalance issue has not received sufficient attention in ASQP task. In this paper, we divide the issue into two-folds, quad-pattern imbalance and aspect-category imbalance, and propose an Adaptive Data Augmentation (ADA) framework to tackle the imbalance issue. Specifically, a data augmentation process with a condition function adaptively enhances the tail quad patterns and aspect categories, alleviating the data imbalance in ASQP. Following previous studies, we also further explore the generative framework for extracting complete quads by introducing the category prior knowledge and syntax-guided decoding target. Experimental results demonstrate that data augmentation for imbalance in ASQP task can improve the performance, and the proposed ADA method is superior to naive data oversampling.
- [766] arXiv:2401.06401 (cross-list from cs.SE) [ src ]
-
Title: DevEval: Evaluating Code Generation in Practical Software ProjectsAuthors: Jia Li , Ge Li , Yunfei Zhao , Yongmin Li , Zhi Jin , Hao Zhu , Huanyu Liu , Kaibo Liu , Lecheng Wang , Zheng Fang , Lanshen Wang , Jiazheng Ding , Xuanming Zhang , Yihong Dong , Yuqi Zhu , Bin Gu , Mengfei YangComments: We are re-checking this benchmark and repeating related experiments. New versions of DevEval will be released laterSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: How to evaluate Large Language Models (LLMs) in code generation is an open question. Many benchmarks have been proposed but are inconsistent with practical software projects, e.g., unreal program distributions, insufficient dependencies, and small-scale project contexts. Thus, the capabilities of LLMs in practical projects are still unclear. In this paper, we propose a new benchmark named DevEval, aligned with Developers' experiences in practical projects. DevEval is collected through a rigorous pipeline, containing 2,690 samples from 119 practical projects and covering 10 domains. Compared to previous benchmarks, DevEval aligns to practical projects in multiple dimensions, e.g., real program distributions, sufficient dependencies, and enough-scale project contexts. We assess five popular LLMs on DevEval (e.g., gpt-4, gpt-3.5-turbo, CodeLLaMa, and StarCoder) and reveal their actual abilities in code generation. For instance, the highest Pass@1 of gpt-3.5-turbo only is 42 in our experiments. We also discuss the challenges and future directions of code generation in practical projects. We open-source DevEval and hope it can facilitate the development of code generation in practical projects.
- [767] arXiv:2401.06406 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A reviewAuthors: Lingchao Mao , Hairong Wang , Leland S. Hu , Nhan L Tran , Peter D Canoll , Kristin R Swanson , Jing LiComments: 41 pages, 4 figures, 2 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.
- [768] arXiv:2401.06416 (cross-list from cs.CL) [ pdf , other ]
-
Title: Mission: Impossible Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn. However, there is very little published experimental evidence to support such a claim. Here, we develop a set of synthetic impossible languages of differing complexity, each designed by systematically altering English data with unnatural word orders and grammar rules. These languages lie on an impossibility continuum: at one end are languages that are inherently impossible, such as random and irreversible shuffles of English words, and on the other, languages that may not be intuitively impossible but are often considered so in linguistics, particularly those with rules based on counting word positions. We report on a wide range of evaluations to assess the capacity of GPT-2 small models to learn these uncontroversially impossible languages, and crucially, we perform these assessments at various stages throughout training to compare the learning process for each language. Our core finding is that GPT-2 struggles to learn impossible languages when compared to English as a control, challenging the core claim. More importantly, we hope our approach opens up a productive line of inquiry in which different LLM architectures are tested on a variety of impossible languages in an effort to learn more about how LLMs can be used as tools for these cognitive and typological investigations.
- [769] arXiv:2401.06421 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Uncertainty quantification for probabilistic machine learning in earth observation using conformal predictionAuthors: Geethen Singh , Glenn Moncrieff , Zander Venter , Kerry Cawse-Nicholson , Jasper Slingsby , Tamara B RobinsonSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Unreliable predictions can occur when using artificial intelligence (AI) systems with negative consequences for downstream applications, particularly when employed for decision-making. Conformal prediction provides a model-agnostic framework for uncertainty quantification that can be applied to any dataset, irrespective of its distribution, post hoc. In contrast to other pixel-level uncertainty quantification methods, conformal prediction operates without requiring access to the underlying model and training dataset, concurrently offering statistically valid and informative prediction regions, all while maintaining computational efficiency. In response to the increased need to report uncertainty alongside point predictions, we bring attention to the promise of conformal prediction within the domain of Earth Observation (EO) applications. To accomplish this, we assess the current state of uncertainty quantification in the EO domain and found that only 20% of the reviewed Google Earth Engine (GEE) datasets incorporated a degree of uncertainty information, with unreliable methods prevalent. Next, we introduce modules that seamlessly integrate into existing GEE predictive modelling workflows and demonstrate the application of these tools for datasets spanning local to global scales, including the Dynamic World and Global Ecosystem Dynamics Investigation (GEDI) datasets. These case studies encompass regression and classification tasks, featuring both traditional and deep learning-based workflows. Subsequently, we discuss the opportunities arising from the use of conformal prediction in EO. We anticipate that the increased availability of easy-to-use implementations of conformal predictors, such as those provided here, will drive wider adoption of rigorous uncertainty quantification in EO, thereby enhancing the reliability of uses such as operational monitoring and decision making.
- [770] arXiv:2401.06426 (cross-list from cs.CV) [ pdf , other ]
-
Title: UPDP: A Unified Progressive Depth Pruner for CNN and Vision TransformerAuthors: Ji Liu , Dehua Tang , Yuanxian Huang , Li Zhang , Xiaocheng Zeng , Dong Li , Mingjie Lu , Jinzhang Peng , Yu Wang , Fan Jiang , Lu Tian , Ashish SirasaoSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Traditional channel-wise pruning methods by reducing network channels struggle to effectively prune efficient CNN models with depth-wise convolutional layers and certain efficient modules, such as popular inverted residual blocks. Prior depth pruning methods by reducing network depths are not suitable for pruning some efficient models due to the existence of some normalization layers. Moreover, finetuning subnet by directly removing activation layers would corrupt the original model weights, hindering the pruned model from achieving high performance. To address these issues, we propose a novel depth pruning method for efficient models. Our approach proposes a novel block pruning strategy and progressive training method for the subnet. Additionally, we extend our pruning method to vision transformer models. Experimental results demonstrate that our method consistently outperforms existing depth pruning methods across various pruning configurations. We obtained three pruned ConvNeXtV1 models with our method applying on ConvNeXtV1, which surpass most SOTA efficient models with comparable inference performance. Our method also achieves state-of-the-art pruning performance on the vision transformer model.
- [771] arXiv:2401.06431 (cross-list from cs.CL) [ pdf , other ]
-
Title: From Automation to Augmentation: Large Language Models Elevating Essay Scoring LandscapeSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Receiving immediate and personalized feedback is crucial for second-language learners, and Automated Essay Scoring (AES) systems are a vital resource when human instructors are unavailable. This study investigates the effectiveness of Large Language Models (LLMs), specifically GPT-4 and fine-tuned GPT-3.5, as tools for AES. Our comprehensive set of experiments, conducted on both public and private datasets, highlights the remarkable advantages of LLM-based AES systems. They include superior accuracy, consistency, generalizability, and interpretability, with fine-tuned GPT-3.5 surpassing traditional grading models. Additionally, we undertake LLM-assisted human evaluation experiments involving both novice and expert graders. One pivotal discovery is that LLMs not only automate the grading process but also enhance the performance of human graders. Novice graders when provided with feedback generated by LLMs, achieve a level of accuracy on par with experts, while experts become more efficient and maintain greater consistency in their assessments. These results underscore the potential of LLMs in educational technology, paving the way for effective collaboration between humans and AI, ultimately leading to transformative learning experiences through AI-generated feedback.
- [772] arXiv:2401.06436 (cross-list from cs.LG) [ pdf , other ]
-
Title: Improving Graph Convolutional Networks with Transformer Layer in social-based items recommendationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: In this work, we have proposed an approach for improving the GCN for predicting ratings in social networks. Our model is expanded from the standard model with several layers of transformer architecture. The main focus of the paper is on the encoder architecture for node embedding in the network. Using the embedding layer from the graph-based convolution layer, the attention mechanism could rearrange the feature space to get a more efficient embedding for the downstream task. The experiments showed that our proposed architecture achieves better performance than GCN on the traditional link prediction task.
- [773] arXiv:2401.06437 (cross-list from cs.GR) [ pdf , other ]
-
Title: 3D-PreMise: Can Large Language Models Generate 3D Shapes with Sharp Features and Parametric Control?Comments: 10 pages, 6 figuresSubjects: Graphics (cs.GR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Recent advancements in implicit 3D representations and generative models have markedly propelled the field of 3D object generation forward. However, it remains a significant challenge to accurately model geometries with defined sharp features under parametric controls, which is crucial in fields like industrial design and manufacturing. To bridge this gap, we introduce a framework that employs Large Language Models (LLMs) to generate text-driven 3D shapes, manipulating 3D software via program synthesis. We present 3D-PreMise, a dataset specifically tailored for 3D parametric modeling of industrial shapes, designed to explore state-of-the-art LLMs within our proposed pipeline. Our work reveals effective generation strategies and delves into the self-correction capabilities of LLMs using a visual interface. Our work highlights both the potential and limitations of LLMs in 3D parametric modeling for industrial applications.
- [774] arXiv:2401.06461 (cross-list from cs.SE) [ pdf , other ]
-
Title: Between Lines of Code: Unraveling the Distinct Patterns of Machine and Human ProgrammersComments: code available at this https URLSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large language models have catalyzed an unprecedented wave in code generation. While achieving significant advances, they blur the distinctions between machine- and human-authored source code, causing integrity and authenticity issues of software artifacts. Previous methods such as DetectGPT have proven effective in discerning machine-generated texts, but they do not identify and harness the unique patterns of machine-generated code. Thus, its applicability falters when applied to code. In this paper, we carefully study the specific patterns that characterize machine- and human-authored code. Through a rigorous analysis of code attributes such as lexical diversity, conciseness, and naturalness, we expose unique patterns inherent to each source. We particularly notice that the syntactic segmentation of code is a critical factor in identifying its provenance. Based on our findings, we propose DetectCodeGPT, a novel method for detecting machine-generated code, which improves DetectGPT by capturing the distinct stylized patterns of code. Diverging from conventional techniques that depend on external LLMs for perturbations, DetectCodeGPT perturbs the code corpus by strategically inserting spaces and newlines, ensuring both efficacy and efficiency. Experiment results show that our approach significantly outperforms state-of-the-art techniques in detecting machine-generated code.
- [775] arXiv:2401.06466 (cross-list from cs.CL) [ pdf , other ]
-
Title: PersianMind: A Cross-Lingual Persian-English Large Language ModelSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models demonstrate remarkable proficiency in various linguistic tasks and have extensive knowledge across various domains. Although they perform best in English, their ability in other languages is notable too. In contrast, open-source models, such as LLaMa, are primarily trained on English datasets, resulting in poor performance in non-English languages. In this paper, we introduce PersianMind, an open-source bilingual large language model which demonstrates comparable performance to closed-source GPT-3.5-turbo in the Persian language. By expanding LLaMa2's vocabulary with 10,000 Persian tokens and training it on a dataset comprising nearly 2 billion Persian tokens, we show that our approach preserves the model's English knowledge and employs transfer learning to excel at transferring task knowledge from one language to another.
- [776] arXiv:2401.06477 (cross-list from cs.CL) [ pdf , other ]
-
Title: Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-TranslationAuthors: Tianyu Zheng , Shuyue Guo , Xingwei Qu , Jiawei Guo , Weixu Zhang , Xinrun Du , Qi Jia , Chenghua Lin , Wenhao Huang , Wenhu Chen , Jie Fu , Ge ZhangComments: 12 pages, 12 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we introduce Kun, a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations. Adapting a self-training algorithm based on instruction back-translation and answer polishment, Kun leverages unlabelled data from diverse sources such as Wudao, Wanjuan, and SkyPile to generate a substantial dataset of over a million Chinese instructional data points. This approach significantly deviates from traditional methods by using a self-curation process to refine and select the most effective instruction-output pairs. Our experiments with the 6B-parameter Yi model across various benchmarks demonstrate Kun's robustness and scalability. Our method's core contributions lie in its algorithmic advancement, which enhances data retention and clarity, and its innovative data generation approach that substantially reduces the reliance on costly and time-consuming manual annotations. This methodology presents a scalable and efficient solution for improving the instruction-following capabilities of LLMs, with significant implications for their application across diverse fields. The code and dataset can be found at this https URL
- [777] arXiv:2401.06493 (cross-list from cs.DB) [ pdf , other ]
-
Title: Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic DatabasesComments: 27 pages, including 20 pages of maintext. This is the authors' version of the corresponding PODS'2024 articleSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)
Abstract: Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work we adapt these Shapley-like scores to probabilistic settings, the objective being to compute their expected value. We show that the computations of expected Shapley values and of the expected values of Boolean functions are interreducible in polynomial time, thus obtaining the same tractability landscape. We investigate the specific tractable case where Boolean functions are represented as deterministic decomposable circuits, designing a polynomial-time algorithm for this setting. We present applications to probabilistic databases through database provenance, and an effective implementation of this algorithm within the ProvSQL system, which experimentally validates its feasibility over a standard benchmark.
- [778] arXiv:2401.06503 (cross-list from cs.CV) [ pdf , other ]
-
Title: Improving the Detection of Small Oriented Objects in Aerial ImagesComments: C. T. C. Doloriel and R. D. Cajote, "Improving the Detection of Small Oriented Objects in Aerial Images," 2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Waikoloa, HI, USA, 2023, pp. 176-185, doi: 10.1109/WACVW58289. 2023.00023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Small oriented objects that represent tiny pixel-area in large-scale aerial images are difficult to detect due to their size and orientation. Existing oriented aerial detectors have shown promising results but are mainly focused on orientation modeling with less regard to the size of the objects. In this work, we proposed a method to accurately detect small oriented objects in aerial images by enhancing the classification and regression tasks of the oriented object detection model. We designed the Attention-Points Network consisting of two losses: Guided-Attention Loss (GALoss) and Box-Points Loss (BPLoss). GALoss uses an instance segmentation mask as ground-truth to learn the attention features needed to improve the detection of small objects. These attention features are then used to predict box points for BPLoss, which determines the points' position relative to the target oriented bounding box. Experimental results show the effectiveness of our Attention-Points Network on a standard oriented aerial dataset with small object instances (DOTA-v1.5) and on a maritime-related dataset (HRSC2016). The code is publicly available.
- [779] arXiv:2401.06506 (cross-list from cs.CV) [ pdf , other ]
-
Title: Frequency Masking for Universal Deepfake DetectionComments: Accepted to IEEE ICASSP-2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We study universal deepfake detection. Our goal is to detect synthetic images from a range of generative AI approaches, particularly from emerging ones which are unseen during training of the deepfake detector. Universal deepfake detection requires outstanding generalization capability. Motivated by recently proposed masked image modeling which has demonstrated excellent generalization in self-supervised pre-training, we make the first attempt to explore masked image modeling for universal deepfake detection. We study spatial and frequency domain masking in training deepfake detectors. Based on empirical analysis, we propose a novel deepfake detector via frequency masking. Our focus on frequency domain is different from the majority, which primarily target spatial domain detection. Our comparative analyses reveal substantial performance gains over existing methods. Code and models are publicly available.
- [780] arXiv:2401.06513 (cross-list from cs.SE) [ pdf , other ]
-
Title: ML-On-Rails: Safeguarding Machine Learning Models in Software Systems A Case StudyAuthors: Hala Abdelkader , Mohamed Abdelrazek , Scott Barnett , Jean-Guy Schneider , Priya Rani , Rajesh VasaSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Machine learning (ML), especially with the emergence of large language models (LLMs), has significantly transformed various industries. However, the transition from ML model prototyping to production use within software systems presents several challenges. These challenges primarily revolve around ensuring safety, security, and transparency, subsequently influencing the overall robustness and trustworthiness of ML models. In this paper, we introduce ML-On-Rails, a protocol designed to safeguard ML models, establish a well-defined endpoint interface for different ML tasks, and clear communication between ML providers and ML consumers (software engineers). ML-On-Rails enhances the robustness of ML models via incorporating detection capabilities to identify unique challenges specific to production ML. We evaluated the ML-On-Rails protocol through a real-world case study of the MoveReminder application. Through this evaluation, we emphasize the importance of safeguarding ML models in production.
- [781] arXiv:2401.06528 (cross-list from cs.CV) [ pdf , other ]
-
Title: PCB-Vision: A Multiscene RGB-Hyperspectral Benchmark Dataset of Printed Circuit BoardsAuthors: Elias Arbash , Margret Fuchs , Behnood Rasti , Sandra Lorenz , Pedram Ghamisi , Richard GloaguenSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Addressing the critical theme of recycling electronic waste (E-waste), this contribution is dedicated to developing advanced automated data processing pipelines as a basis for decision-making and process control. Aligning with the broader goals of the circular economy and the United Nations (UN) Sustainable Development Goals (SDG), our work leverages non-invasive analysis methods utilizing RGB and hyperspectral imaging data to provide both quantitative and qualitative insights into the E-waste stream composition for optimizing recycling efficiency. In this paper, we introduce 'PCB-Vision'; a pioneering RGB-hyperspectral printed circuit board (PCB) benchmark dataset, comprising 53 RGB images of high spatial resolution paired with their corresponding high spectral resolution hyperspectral data cubes in the visible and near-infrared (VNIR) range. Grounded in open science principles, our dataset provides a comprehensive resource for researchers through high-quality ground truths, focusing on three primary PCB components: integrated circuits (IC), capacitors, and connectors. We provide extensive statistical investigations on the proposed dataset together with the performance of several state-of-the-art (SOTA) models, including U-Net, Attention U-Net, Residual U-Net, LinkNet, and DeepLabv3+. By openly sharing this multi-scene benchmark dataset along with the baseline codes, we hope to foster transparent, traceable, and comparable developments of advanced data processing across various scientific communities, including, but not limited to, computer vision and remote sensing. Emphasizing our commitment to supporting a collaborative and inclusive scientific community, all materials, including code, data, ground truth, and masks, will be accessible at this https URL .
- [782] arXiv:2401.06538 (cross-list from cs.NI) [ pdf , other ]
-
Title: Intelligent Data-Driven Architectural Features Orchestration for Network SlicingAuthors: Rodrigo Moreira , Flavio de Oliveira Silva , Tereza Cristina Melo de Brito Carvalho , Joberto S. B. MartinsComments: 12 pages, 6 figures, Conference ADVANCE 24 - International Workshop on ADVANCEs in ICT Infrastructures and Services - February 26--29, 2024 - Hanoi, VietnamSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Network slicing is a crucial enabler and a trend for the Next Generation Mobile Network (NGMN) and various other new systems like the Internet of Vehicles (IoV) and Industrial IoT (IIoT). Orchestration and machine learning are key elements with a crucial role in the network-slicing processes since the NS process needs to orchestrate resources and functionalities, and machine learning can potentially optimize the orchestration process. However, existing network-slicing architectures lack the ability to define intelligent approaches to orchestrate features and resources in the slicing process. This paper discusses machine learning-based orchestration of features and capabilities in network slicing architectures. Initially, the slice resource orchestration and allocation in the slicing planning, configuration, commissioning, and operation phases are analyzed. In sequence, we highlight the need for optimized architectural feature orchestration and recommend using ML-embed agents, federated learning intrinsic mechanisms for knowledge acquisition, and a data-driven approach embedded in the network slicing architecture. We further develop an architectural features orchestration case embedded in the SFI2 network slicing architecture. An attack prevention security mechanism is developed for the SFI2 architecture using distributed embedded and cooperating ML agents. The case presented illustrates the architectural feature's orchestration process and benefits, highlighting its importance for the network slicing process.
- [783] arXiv:2401.06541 (cross-list from cs.CL) [ pdf , other ]
-
Title: Medical Dialogue Generation via Intuitive-then-Analytical Differential DiagnosisComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Medical dialogue systems have attracted growing research attention as they have the potential to provide rapid diagnoses, treatment plans, and health consultations. In medical dialogues, a proper diagnosis is crucial as it establishes the foundation for future consultations. Clinicians typically employ both intuitive and analytic reasoning to formulate a differential diagnosis. This reasoning process hypothesizes and verifies a variety of possible diseases and strives to generate a comprehensive and rigorous diagnosis. However, recent studies on medical dialogue generation have overlooked the significance of modeling a differential diagnosis, which hinders the practical application of these systems. To address the above issue, we propose a medical dialogue generation framework with the Intuitive-then-Analytic Differential Diagnosis (IADDx). Our method starts with a differential diagnosis via retrieval-based intuitive association and subsequently refines it through a graph-enhanced analytic procedure. The resulting differential diagnosis is then used to retrieve medical knowledge and guide response generation. Experimental results on two datasets validate the efficacy of our method. Besides, we demonstrate how our framework assists both clinicians and patients in understanding the diagnostic process, for instance, by producing intermediate results and graph-based diagnosis paths.
- [784] arXiv:2401.06550 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Multimodal Urban Areas of Interest Generation via Remote Sensing Imagery and Geographical PriorComments: 9 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Urban area-of-interest (AOI) refers to an integrated urban functional zone with defined polygonal boundaries. The rapid development of urban commerce has led to increasing demands for highly accurate and timely AOI data. However, existing research primarily focuses on coarse-grained functional zones for urban planning or regional economic analysis, and often neglects the expiration of AOI in the real world. They fail to fulfill the precision demands of Mobile Internet Online-to-Offline (O2O) businesses. These businesses require accuracy down to a specific community, school, or hospital. In this paper, we propose a comprehensive end-to-end multimodal deep learning framework designed for simultaneously detecting accurate AOI boundaries and validating the reliability of AOI by leveraging remote sensing imagery coupled with geographical prior, titled AOITR. Unlike conventional AOI generation methods, such as the Road-cut method that segments road networks at various levels, our approach diverges from semantic segmentation algorithms that depend on pixel-level classification. Instead, our AOITR begins by selecting a point-of-interest (POI) of specific category, and uses it to retrieve corresponding remote sensing imagery and geographical prior such as entrance POIs and road nodes. This information helps to build a multimodal detection model based on transformer encoder-decoder architecture to regress the AOI polygon. Additionally, we utilize the dynamic features from human mobility, nearby POIs, and logistics addresses for AOI reliability evaluation via a cascaded network module. The experimental results reveal that our algorithm achieves a significant improvement on Intersection over Union (IoU) metric, surpassing previous methods by a large margin.
- [785] arXiv:2401.06557 (cross-list from cs.LG) [ pdf , other ]
-
Title: Treatment-Aware Hyperbolic Representation Learning for Causal Effect Estimation with Social NetworksComments: Accepted by SIAM SDM'24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Methodology (stat.ME)
Abstract: Estimating the individual treatment effect (ITE) from observational data is a crucial research topic that holds significant value across multiple domains. How to identify hidden confounders poses a key challenge in ITE estimation. Recent studies have incorporated the structural information of social networks to tackle this challenge, achieving notable advancements. However, these methods utilize graph neural networks to learn the representation of hidden confounders in Euclidean space, disregarding two critical issues: (1) the social networks often exhibit a scalefree structure, while Euclidean embeddings suffer from high distortion when used to embed such graphs, and (2) each ego-centric network within a social network manifests a treatment-related characteristic, implying significant patterns of hidden confounders. To address these issues, we propose a novel method called Treatment-Aware Hyperbolic Representation Learning (TAHyper). Firstly, TAHyper employs the hyperbolic space to encode the social networks, thereby effectively reducing the distortion of confounder representation caused by Euclidean embeddings. Secondly, we design a treatment-aware relationship identification module that enhances the representation of hidden confounders by identifying whether an individual and her neighbors receive the same treatment. Extensive experiments on two benchmark datasets are conducted to demonstrate the superiority of our method.
- [786] arXiv:2401.06559 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A General Benchmark Framework is Dynamic Graph Neural Network NeedAuthors: Yusen ZhangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Dynamic graph learning is crucial for modeling real-world systems with evolving relationships and temporal dynamics. However, the lack of a unified benchmark framework in current research has led to inaccurate evaluations of dynamic graph models. This paper highlights the significance of dynamic graph learning and its applications in various domains. It emphasizes the need for a standardized benchmark framework that captures temporal dynamics, evolving graph structures, and downstream task requirements. Establishing a unified benchmark will help researchers understand the strengths and limitations of existing models, foster innovation, and advance dynamic graph learning. In conclusion, this paper identifies the lack of a standardized benchmark framework as a current limitation in dynamic graph learning research . Such a framework will facilitate accurate model evaluation, drive advancements in dynamic graph learning techniques, and enable the development of more effective models for real-world applications.
- [787] arXiv:2401.06568 (cross-list from cs.CL) [ pdf , other ]
-
Title: Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine TranslationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have achieved remarkable results in the machine translation evaluation task, yet there remains a gap in knowledge regarding how they utilize the provided data to conduct evaluations. This study aims to explore how LLMs leverage source and reference information in evaluating translations, with the ultimate goal of better understanding the working mechanism of LLMs. To this end, we design the controlled experiments across various input modes and model types, and employ both coarse-grained and fine-grained prompts to discern the utility of source versus reference information. Surprisingly, we find that reference information significantly enhances the evaluation accuracy, while source information sometimes is counterproductive, indicating a lack of cross-lingual capability when using LLMs to evaluate translations. We further conduct a meta-evaluation for translation error detection of LLMs, observing a similar phenomenon. These findings also suggest a potential research direction for LLMs that fully exploits the cross-lingual capability of LLMs to achieve better performance in machine translation evaluation tasks.
- [788] arXiv:2401.06583 (cross-list from cs.CL) [ pdf , other ]
-
Title: Mapping Transformer Leveraged Embeddings for Cross-Lingual Document RepresentationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Recommendation systems, for documents, have become tools to find relevant content on the Web. However, these systems have limitations when it comes to recommending documents in languages different from the query language, which means they might overlook resources in non-native languages. This research focuses on representing documents across languages by using Transformer Leveraged Document Representations (TLDRs) that are mapped to a cross-lingual domain. Four multilingual pre-trained transformer models (mBERT, mT5 XLM RoBERTa, ErnieM) were evaluated using three mapping methods across 20 language pairs representing combinations of five selected languages of the European Union. Metrics like Mate Retrieval Rate and Reciprocal Rank were used to measure the effectiveness of mapped TLDRs compared to non-mapped ones. The results highlight the power of cross-lingual representations achieved through pre-trained transformers and mapping approaches suggesting a promising direction for expanding beyond language connections, between two specific languages.
- [789] arXiv:2401.06595 (cross-list from cs.LG) [ pdf , other ]
-
Title: Every Node is Different: Dynamically Fusing Self-Supervised Tasks for Attributed Graph ClusteringSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Attributed graph clustering is an unsupervised task that partitions nodes into different groups. Self-supervised learning (SSL) shows great potential in handling this task, and some recent studies simultaneously learn multiple SSL tasks to further boost performance. Currently, different SSL tasks are assigned the same set of weights for all graph nodes. However, we observe that some graph nodes whose neighbors are in different groups require significantly different emphases on SSL tasks. In this paper, we propose to dynamically learn the weights of SSL tasks for different nodes and fuse the embeddings learned from different SSL tasks to boost performance. We design an innovative graph clustering approach, namely Dynamically Fusing Self-Supervised Learning (DyFSS). Specifically, DyFSS fuses features extracted from diverse SSL tasks using distinct weights derived from a gating network. To effectively learn the gating network, we design a dual-level self-supervised strategy that incorporates pseudo labels and the graph structure. Extensive experiments on five datasets show that DyFSS outperforms the state-of-the-art multi-task SSL methods by up to 8.66% on the accuracy metric. The code of DyFSS is available at: this https URL .
- [790] arXiv:2401.06633 (cross-list from cs.IR) [ pdf , other ]
-
Title: Ada-Retrieval: An Adaptive Multi-Round Retrieval Paradigm for Sequential RecommendationsComments: 9 pages, Accepted to AAAI2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Retrieval models aim at selecting a small set of item candidates which match the preference of a given user. They play a vital role in large-scale recommender systems since subsequent models such as rankers highly depend on the quality of item candidates. However, most existing retrieval models employ a single-round inference paradigm, which may not adequately capture the dynamic nature of user preferences and stuck in one area in the item space. In this paper, we propose Ada-Retrieval, an adaptive multi-round retrieval paradigm for recommender systems that iteratively refines user representations to better capture potential candidates in the full item space. Ada-Retrieval comprises two key modules: the item representation adapter and the user representation adapter, designed to inject context information into items' and users' representations. The framework maintains a model-agnostic design, allowing seamless integration with various backbone models such as RNNs or Transformers. We perform experiments on three widely used public datasets, incorporating five powerful sequential recommenders as backbone models. Our results demonstrate that Ada-Retrieval significantly enhances the performance of various base models, with consistent improvements observed across different datasets. Our code and data are publicly available at: this https URL .
- [791] arXiv:2401.06634 (cross-list from cs.LG) [ pdf , other ]
-
Title: CCFC: Bridging Federated Clustering and Contrastive LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Federated clustering, an essential extension of centralized clustering for federated scenarios, enables multiple data-holding clients to collaboratively group data while keeping their data locally. In centralized scenarios, clustering driven by representation learning has made significant advancements in handling high-dimensional complex data. However, the combination of federated clustering and representation learning remains underexplored. To bridge this, we first tailor a cluster-contrastive model for learning clustering-friendly representations. Then, we harness this model as the foundation for proposing a new federated clustering method, named cluster-contrastive federated clustering (CCFC). Benefiting from representation learning, the clustering performance of CCFC even double those of the best baseline methods in some cases. Compared to the most related baseline, the benefit results in substantial NMI score improvements of up to 0.4155 on the most conspicuous case. Moreover, CCFC also shows superior performance in handling device failures from a practical viewpoint.
- [792] arXiv:2401.06640 (cross-list from cs.CL) [ pdf , other ]
-
Title: Experimental Contexts Can Facilitate Robust Semantic Property Inference in Language Models, but InconsistentlySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent zero-shot evaluations have highlighted important limitations in the abilities of language models (LMs) to perform meaning extraction. However, it is now well known that LMs can demonstrate radical improvements in the presence of experimental contexts such as in-context examples and instructions. How well does this translate to previously studied meaning-sensitive tasks? We present a case-study on the extent to which experimental contexts can improve LMs' robustness in performing property inheritance -- predicting semantic properties of novel concepts, a task that they have been previously shown to fail on. Upon carefully controlling the nature of the in-context examples and the instructions, our work reveals that they can indeed lead to non-trivial property inheritance behavior in LMs. However, this ability is inconsistent: with a minimal reformulation of the task, some LMs were found to pick up on shallow, non-semantic heuristics from their inputs, suggesting that the computational principles of semantic property inference are yet to be mastered by LMs.
- [793] arXiv:2401.06654 (cross-list from cs.CV) [ pdf , other ]
-
Title: Decoupling Pixel Flipping and Occlusion Strategy for Consistent XAI BenchmarksComments: 28 pages, 8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Feature removal is a central building block for eXplainable AI (XAI), both for occlusion-based explanations (Shapley values) as well as their evaluation (pixel flipping, PF). However, occlusion strategies can vary significantly from simple mean replacement up to inpainting with state-of-the-art diffusion models. This ambiguity limits the usefulness of occlusion-based approaches. For example, PF benchmarks lead to contradicting rankings. This is amplified by competing PF measures: Features are either removed starting with most influential first (MIF) or least influential first (LIF). This study proposes two complementary perspectives to resolve this disagreement problem. Firstly, we address the common criticism of occlusion-based XAI, that artificial samples lead to unreliable model evaluations. We propose to measure the reliability by the R(eference)-Out-of-Model-Scope (OMS) score. The R-OMS score enables a systematic comparison of occlusion strategies and resolves the disagreement problem by grouping consistent PF rankings. Secondly, we show that the insightfulness of MIF and LIF is conversely dependent on the R-OMS score. To leverage this, we combine the MIF and LIF measures into the symmetric relevance gain (SRG) measure. This breaks the inherent connection to the underlying occlusion strategy and leads to consistent rankings. This resolves the disagreement problem, which we verify for a set of 40 different occlusion strategies.
- [794] arXiv:2401.06676 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: LLMRS: Unlocking Potentials of LLM-Based Recommender Systems for Software PurchaseAuthors: Angela John , Theophilus Aidoo , Hamayoon Behmanush , Irem B. Gunduz , Hewan Shrestha , Maxx Richard Rahman , Wolfgang MaaßSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Recommendation systems are ubiquitous, from Spotify playlist suggestions to Amazon product suggestions. Nevertheless, depending on the methodology or the dataset, these systems typically fail to capture user preferences and generate general recommendations. Recent advancements in Large Language Models (LLM) offer promising results for analyzing user queries. However, employing these models to capture user preferences and efficiency remains an open question. In this paper, we propose LLMRS, an LLM-based zero-shot recommender system where we employ pre-trained LLM to encode user reviews into a review score and generate user-tailored recommendations. We experimented with LLMRS on a real-world dataset, the Amazon product reviews, for software purchase use cases. The results show that LLMRS outperforms the ranking-based baseline model while successfully capturing meaningful information from product reviews, thereby providing more reliable recommendations.
- [795] arXiv:2401.06683 (cross-list from cs.IR) [ pdf , other ]
-
Title: DQNC2S: DQN-based Cross-stream Crisis event SummarizerComments: accepted at ECIR 2024Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Summarizing multiple disaster-relevant data streams simultaneously is particularly challenging as existing Retrieve&Re-ranking strategies suffer from the inherent redundancy of multi-stream data and limited scalability in a multi-query setting. This work proposes an online approach to crisis timeline generation based on weak annotation with Deep Q-Networks. It selects on-the-fly the relevant pieces of text without requiring neither human annotations nor content re-ranking. This makes the inference time independent of the number of input queries. The proposed approach also incorporates a redundancy filter into the reward function to effectively handle cross-stream content overlaps. The achieved ROUGE and BERTScore results are superior to those of best-performing models on the CrisisFACTS 2022 benchmark.
- [796] arXiv:2401.06686 (cross-list from cs.HC) [ pdf , other ]
-
Title: Exploring Conversational Agents as an Effective Tool for Measuring Cognitive Biases in Decision-MakingAuthors: Stephen PilliSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Heuristics and cognitive biases are an integral part of human decision-making. Automatically detecting a particular cognitive bias could enable intelligent tools to provide better decision-support. Detecting the presence of a cognitive bias currently requires a hand-crafted experiment and human interpretation. Our research aims to explore conversational agents as an effective tool to measure various cognitive biases in different domains. Our proposed conversational agent incorporates a bias measurement mechanism that is informed by the existing experimental designs and various experimental tasks identified in the literature. Our initial experiments to measure framing and loss-aversion biases indicate that the conversational agents can be effectively used to measure the biases.
- [797] arXiv:2401.06692 (cross-list from cs.CL) [ pdf , other ]
-
Title: An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language ModelsAuthors: Gantavya Bhatt , Yifang Chen , Arnav M. Das , Jifan Zhang , Sang T. Truong , Stephen Mussmann , Yinglun Zhu , Jeffrey Bilmes , Simon S. Du , Kevin Jamieson , Jordan T. Ash , Robert D. NowakSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.
- [798] arXiv:2401.06699 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Closed-form Solution for Weight Optimization in Fully-connected Feed-forward Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This work addresses weight optimization problem for fully-connected feed-forward neural networks. Unlike existing approaches that are based on back-propagation (BP) and chain rule gradient-based optimization (which implies iterative execution, potentially burdensome and time-consuming in some cases), the proposed approach offers the solution for weight optimization in closed-form by means of least squares (LS) methodology. In the case where the input-to-output mapping is injective, the new approach optimizes the weights in a back-propagating fashion in a single iteration by jointly optimizing a set of weights in each layer for each neuron. In the case where the input-to-output mapping is not injective (e.g., in classification problems), the proposed solution is easily adapted to obtain its final solution in a few iterations. An important advantage over the existing solutions is that these computations (for all neurons in a layer) are independent from each other; thus, they can be carried out in parallel to optimize all weights in a given layer simultaneously. Furthermore, its running time is deterministic in the sense that one can obtain the exact number of computations necessary to optimize the weights in all network layers (per iteration, in the case of non-injective mapping). Our simulation and empirical results show that the proposed scheme, BPLS, works well and is competitive with existing ones in terms of accuracy, but significantly surpasses them in terms of running time. To summarize, the new method is straightforward to implement, is competitive and computationally more efficient than the existing ones, and is well-tailored for parallel implementation.
- [799] arXiv:2401.06709 (cross-list from cs.CL) [ pdf , other ]
-
Title: Reliability Analysis of Psychological Concept Extraction and Classification in User-penned TextSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The social NLP research community witness a recent surge in the computational advancements of mental health analysis to build responsible AI models for a complex interplay between language use and self-perception. Such responsible AI models aid in quantifying the psychological concepts from user-penned texts on social media. On thinking beyond the low-level (classification) task, we advance the existing binary classification dataset, towards a higher-level task of reliability analysis through the lens of explanations, posing it as one of the safety measures. We annotate the LoST dataset to capture nuanced textual cues that suggest the presence of low self-esteem in the posts of Reddit users. We further state that the NLP models developed for determining the presence of low self-esteem, focus more on three types of textual cues: (i) Trigger: words that triggers mental disturbance, (ii) LoST indicators: text indicators emphasizing low self-esteem, and (iii) Consequences: words describing the consequences of mental disturbance. We implement existing classifiers to examine the attention mechanism in pre-trained language models (PLMs) for a domain-specific psychology-grounded task. Our findings suggest the need of shifting the focus of PLMs from Trigger and Consequences to a more comprehensive explanation, emphasizing LoST indicators while determining low self-esteem in Reddit posts.
- [800] arXiv:2401.06715 (cross-list from cs.CL) [ pdf , other ]
-
Title: Reframing Tax Law Entailment as Analogical ReasoningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Statutory reasoning refers to the application of legislative provisions to a series of case facts described in natural language. We re-frame statutory reasoning as an analogy task, where each instance of the analogy task involves a combination of two instances of statutory reasoning. This increases the dataset size by two orders of magnitude, and introduces an element of interpretability. We show that this task is roughly as difficult to Natural Language Processing models as the original task. Finally, we come back to statutory reasoning, solving it with a combination of a retrieval mechanism and analogy models, and showing some progress on prior comparable work.
- [801] arXiv:2401.06730 (cross-list from cs.CL) [ pdf , other ]
-
Title: Relying on the Unreliable: The Impact of Language Models' Reluctance to Express UncertaintySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: As natural language becomes the default interface for human-AI interaction, there is a critical need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence about their responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are unable to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (on average 47%) among confident responses. We test the risks of LM overconfidence by running human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in RLHF alignment and find that humans have a bias against texts with uncertainty. Our work highlights a new set of safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.
- [802] arXiv:2401.06742 (cross-list from cs.CL) [ pdf , other ]
-
Title: Using Natural Language Inference to Improve Persona Extraction from Dialogue in a New DomainAuthors: Alexandra DeLucia , Mengjie Zhao , Yoshinori Maeda , Makoto Yoda , Keiichi Yamada , Hiromi WakakiComments: Code and models will be released upon publicationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While valuable datasets such as PersonaChat provide a foundation for training persona-grounded dialogue agents, they lack diversity in conversational and narrative settings, primarily existing in the "real" world. To develop dialogue agents with unique personas, models are trained to converse given a specific persona, but hand-crafting these persona can be time-consuming, thus methods exist to automatically extract persona information from existing character-specific dialogue. However, these persona-extraction models are also trained on datasets derived from PersonaChat and struggle to provide high-quality persona information from conversational settings that do not take place in the real world, such as the fantasy-focused dataset, LIGHT. Creating new data to train models on a specific setting is human-intensive, thus prohibitively expensive. To address both these issues, we introduce a natural language inference method for post-hoc adapting a trained persona extraction model to a new setting. We draw inspiration from the literature of dialog natural language inference (NLI), and devise NLI-reranking methods to extract structured persona information from dialogue. Compared to existing persona extraction models, our method returns higher-quality extracted persona and requires less human annotation.
- [803] arXiv:2401.06751 (cross-list from cs.CL) [ pdf , other ]
-
Title: The Unreasonable Effectiveness of Easy Training Data for Hard TasksComments: 22 pages, 20 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: How can we train models to perform well on hard test data when hard training data is by definition difficult to label correctly? This question has been termed the scalable oversight problem and has drawn increasing attention as language models have continually improved. In this paper, we present the surprising conclusion that current language models often generalize relatively well from easy to hard data, even performing as well as "oracle" models trained on hard data. We demonstrate this kind of easy-to-hard generalization using simple training methods like in-context learning, linear classifier heads, and QLoRA for seven different measures of datapoint hardness, including six empirically diverse human hardness measures (like grade level) and one model-based measure (loss-based). Furthermore, we show that even if one cares most about model performance on hard data, it can be better to collect and train on easy data rather than hard data, since hard data is generally noisier and costlier to collect. Our experiments use open models up to 70b in size and four publicly available question-answering datasets with questions ranging in difficulty from 3rd grade science questions to college level STEM questions and general-knowledge trivia. We conclude that easy-to-hard generalization in LMs is surprisingly strong for the tasks studied, suggesting the scalable oversight problem may be easier than previously thought. Our code is available at this https URL
- [804] arXiv:2401.06757 (cross-list from cs.CV) [ pdf , other ]
-
Title: Synthetic Data Generation Framework, Dataset, and Efficient Deep Model for Pedestrian Intention PredictionJournal-ref: 26th IEEE International Conference on Intelligent Transportation Systems ITSC 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Pedestrian intention prediction is crucial for autonomous driving. In particular, knowing if pedestrians are going to cross in front of the ego-vehicle is core to performing safe and comfortable maneuvers. Creating accurate and fast models that predict such intentions from sequential images is challenging. A factor contributing to this is the lack of datasets with diverse crossing and non-crossing (C/NC) scenarios. We address this scarceness by introducing a framework, named ARCANE, which allows programmatically generating synthetic datasets consisting of C/NC video clip samples. As an example, we use ARCANE to generate a large and diverse dataset named PedSynth. We will show how PedSynth complements widely used real-world datasets such as JAAD and PIE, so enabling more accurate models for C/NC prediction. Considering the onboard deployment of C/NC prediction models, we also propose a deep model named PedGNN, which is fast and has a very low memory footprint. PedGNN is based on a GNN-GRU architecture that takes a sequence of pedestrian skeletons as input to predict crossing intentions.
- [805] arXiv:2401.06775 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large language models in healthcare and medical domain: A reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable capability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey delves into the functionalities of existing LLMs designed for healthcare applications, elucidating the trajectory of their development, starting from traditional Pretrained Language Models (PLMs) to the present state of LLMs in healthcare sector. First, we explore the potential of LLMs to amplify the efficiency and effectiveness of diverse healthcare applications, particularly focusing on clinical language understanding tasks. These tasks encompass a wide spectrum, ranging from named entity recognition and relation extraction to natural language inference, multi-modal medical applications, document classification, and question-answering. Additionally, we conduct an extensive comparison of the most recent state-of-the-art LLMs in the healthcare domain, while also assessing the utilization of various open-source LLMs and highlighting their significance in healthcare applications. Furthermore, we present the essential performance metrics employed to evaluate LLMs in the biomedical domain, shedding light on their effectiveness and limitations. Finally, we summarize the prominent challenges and constraints faced by large language models in the healthcare sector, offering a holistic perspective on their potential benefits and shortcomings. This review provides a comprehensive exploration of the current landscape of LLMs in healthcare, addressing their role in transforming medical applications and the areas that warrant further research and development.
- [806] arXiv:2401.06782 (cross-list from cs.CL) [ pdf , other ]
-
Title: Semantic Similarity Matching for Patent Documents Using Ensemble BERT-related Model and Novel Text Processing MethodComments: It accepted by The 6th International Conference on Machine Learning and Machine Intelligence (MLMI 2023)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In the realm of patent document analysis, assessing semantic similarity between phrases presents a significant challenge, notably amplifying the inherent complexities of Cooperative Patent Classification (CPC) research. Firstly, this study addresses these challenges, recognizing early CPC work while acknowledging past struggles with language barriers and document intricacy. Secondly, it underscores the persisting difficulties of CPC research.
To overcome these challenges and bolster the CPC system, This paper presents two key innovations. Firstly, it introduces an ensemble approach that incorporates four BERT-related models, enhancing semantic similarity accuracy through weighted averaging. Secondly, a novel text preprocessing method tailored for patent documents is introduced, featuring a distinctive input structure with token scoring that aids in capturing semantic relationships during CPC context training, utilizing BCELoss. Our experimental findings conclusively establish the effectiveness of both our Ensemble Model and novel text processing strategies when deployed on the U.S. Patent Phrase to Phrase Matching dataset. - [807] arXiv:2401.06783 (cross-list from cs.CL) [ pdf , other ]
-
Title: MultiSiam: A Multiple Input Siamese Network For Social Media Text Classification And Duplicate Text DetectionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: Social media accounts post increasingly similar content, creating a chaotic experience across platforms, which makes accessing desired information difficult. These posts can be organized by categorizing and grouping duplicates across social handles and accounts. There can be more than one duplicate of a post, however, a conventional Siamese neural network only considers a pair of inputs for duplicate text detection. In this paper, we first propose a multiple-input Siamese network, MultiSiam. This condensed network is then used to propose another model, SMCD (Social Media Classification and Duplication Model) to perform both duplicate text grouping and categorization. The MultiSiam network, just like the Siamese, can be used in multiple applications by changing the sub-network appropriately.
- [808] arXiv:2401.06785 (cross-list from cs.CL) [ pdf , other ]
-
Title: Human-Instruction-Free LLM Self-Alignment with Limited SamplesAuthors: Hongyi Guo , Yuanshun Yao , Wei Shen , Jiaheng Wei , Xiaoying Zhang , Zhaoran Wang , Yang LiuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Aligning large language models (LLMs) with human values is a vital task for LLM practitioners. Current alignment techniques have several limitations: (1) requiring a large amount of annotated data; (2) demanding heavy human involvement; (3) lacking a systematic mechanism to continuously improve. In this work, we study aligning LLMs to a new domain with limited samples (e.g. < 100). We propose an algorithm that can self-align LLMs iteratively without active human involvement. Unlike existing works, our algorithm relies on neither human-crafted instructions nor labeled rewards, significantly reducing human involvement. In addition, our algorithm can self-improve the alignment continuously. The key idea is to first retrieve high-quality samples related to the target domain and use them as In-context Learning examples to generate more samples. Then we use the self-generated samples to finetune the LLM iteratively. We show that our method can unlock the LLMs' self-generalization ability to perform alignment with near-zero human supervision. We test our algorithm on three benchmarks in safety, truthfulness, and instruction-following, and show good performance in alignment, domain adaptability, and scalability.
- [809] arXiv:2401.06786 (cross-list from cs.DC) [ pdf , other ]
-
Title: CloudEval-YAML: A Practical Benchmark for Cloud Configuration GenerationAuthors: Yifei Xu , Yuning Chen , Xumiao Zhang , Xianshang Lin , Pan Hu , Yunfei Ma , Songwu Lu , Wan Du , Zhuoqing Mao , Ennan Zhai , Dennis CaiSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de facto standard of numerous cloud-native tools. We develop the CloudEval-YAML benchmark with practicality in mind: the dataset consists of hand-written problems with unit tests targeting practical scenarios. We further enhanced the dataset to meet practical needs by rephrasing questions in a concise, abbreviated, and bilingual manner. The dataset consists of 1011 problems that take more than 1200 human hours to complete. To improve practicality during evaluation, we build a scalable evaluation platform for CloudEval-YAML that achieves a 20 times speedup over a single machine. To the best of our knowledge, the CloudEval-YAML dataset is the first hand-written dataset targeting cloud-native applications. We present an in-depth evaluation of 12 LLMs, leading to a deeper understanding of the problems and LLMs, as well as effective methods to improve task performance and reduce cost.
- [810] arXiv:2401.06787 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Deep Learning Based Cyberbullying Detection in Bangla LanguageJournal-ref: Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281, Online ISSN: 2516-029X, pp. 50-65, Vol. 8, No. 1, 1st January 2024, Available: http://aetic.theiaer.org/archive/v8/v8n1/p5.htmlSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: The Internet is currently the largest platform for global communication including expressions of opinions, reviews, contents, images, videos and so forth. Moreover, social media has now become a very broad and highly engaging platform due to its immense popularity and swift adoption trend. Increased social networking, however, also has detrimental impacts on the society leading to a range of unwanted phenomena, such as online assault, intimidation, digital bullying, criminality and trolling. Hence, cyberbullying has become a pervasive and worrying problem that poses considerable psychological and emotional harm to the people, particularly amongst the teens and the young adults. In order to lessen its negative effects and provide victims with prompt support, a great deal of research to identify cyberbullying instances at various online platforms is emerging. In comparison to other languages, Bangla (also known as Bengali) has fewer research studies in this domain. This study demonstrates a deep learning strategy for identifying cyberbullying in Bengali, using a dataset of 12282 versatile comments from multiple social media sites. In this study, a two-layer bidirectional long short-term memory (Bi-LSTM) model has been built to identify cyberbullying, using a variety of optimisers as well as 5-fold cross validation. To evaluate the functionality and efficacy of the proposed system, rigorous assessment and validation procedures have been employed throughout the project. The results of this study reveals that the proposed model's accuracy, using momentum-based stochastic gradient descent (SGD) optimiser, is 94.46%. It also reflects a higher accuracy of 95.08% and a F1 score of 95.23% using Adam optimiser as well as a better accuracy of 94.31% in 5-fold cross validation.
- [811] arXiv:2401.06789 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Information Retrieval and Classification of Real-Time Multi-Source Hurricane Evacuation NoticesSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: For an approaching disaster, the tracking of time-sensitive critical information such as hurricane evacuation notices is challenging in the United States. These notices are issued and distributed rapidly by numerous local authorities that may spread across multiple states. They often undergo frequent updates and are distributed through diverse online portals lacking standard formats. In this study, we developed an approach to timely detect and track the locally issued hurricane evacuation notices. The text data were collected mainly with a spatially targeted web scraping method. They were manually labeled and then classified using natural language processing techniques with deep learning models. The classification of mandatory evacuation notices achieved a high accuracy (recall = 96%). We used Hurricane Ian (2022) to illustrate how real-time evacuation notices extracted from local government sources could be redistributed with a Web GIS system. Our method applied to future hurricanes provides live data for situation awareness to higher-level government agencies and news media. The archived data helps scholars to study government responses toward weather warnings and individual behaviors influenced by evacuation history. The framework may be applied to other types of disasters for rapid and targeted retrieval, classification, redistribution, and archiving of real-time government orders and notifications.
- [812] arXiv:2401.06790 (cross-list from cs.CL) [ pdf , other ]
-
Title: Using Zero-shot Prompting in the Automatic Creation and Expansion of Topic Taxonomies for Tagging Retail Banking TransactionsAuthors: Daniel de S. Moraes , Pedro T. C. Santos , Polyana B. da Costa , Matheus A. S. Pinto , Ivan de J. P. Pinto , Álvaro M. G. da Veiga , Sergio Colcher , Antonio J. G. Busson , Rafael H. Rocha , Rennan Gaio , Rafael Miceli , Gabriela Tourinho , Marcos Rabaioli , Leandro Santos , Fellipe Marques , David FavaroSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This work presents an unsupervised method for automatically constructing and expanding topic taxonomies using instruction-based fine-tuned LLMs (Large Language Models). We apply topic modeling and keyword extraction techniques to create initial topic taxonomies and LLMs to post-process the resulting terms and create a hierarchy. To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes, which, to our knowledge, is the first work to present such an approach to taxonomy tasks. We use the resulting taxonomies to assign tags that characterize merchants from a retail bank dataset. To evaluate our work, we asked 12 volunteers to answer a two-part form in which we first assessed the quality of the taxonomies created and then the tags assigned to merchants based on that taxonomy. The evaluation revealed a coherence rate exceeding 90% for the chosen taxonomies. The taxonomies' expansion with LLMs also showed exciting results for parent node prediction, with an f1-score above 70% in our taxonomies.
- [813] arXiv:2401.06791 (cross-list from cs.IR) [ pdf , other ]
-
Title: A Span-based Model for Extracting Overlapping PICO Entities from RCT PublicationsSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Objectives Extraction of PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method PICOX to extract overlapping PICO entities.
Materials and Methods PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using one of the best-performing baselines, EBM-NLP, and three more datasets, i.e., PICO-Corpus, and RCT publications on Alzheimer's Disease or COVID-19, using entity-level precision, recall, and F1 scores.
Results PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (p << 0.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline.
Conclusion PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision. - [814] arXiv:2401.06792 (cross-list from cs.CL) [ pdf , other ]
-
Title: LightHouse: A Survey of AGI HallucinationAuthors: Feng WangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: With the development of artificial intelligence, large-scale models have become increasingly intelligent. However, numerous studies indicate that hallucinations within these large models are a bottleneck hindering the development of AI research. In the pursuit of achieving strong artificial intelligence, a significant volume of research effort is being invested in the AGI (Artificial General Intelligence) hallucination research. Previous explorations have been conducted in researching hallucinations within LLMs (Large Language Models). As for multimodal AGI, research on hallucinations is still in an early stage. To further the progress of research in the domain of hallucinatory phenomena, we present a bird's eye view of hallucinations in AGI, summarizing the current work on AGI hallucinations and proposing some directions for future research.
- [815] arXiv:2401.06795 (cross-list from cs.CL) [ pdf , other ]
-
Title: AI and Generative AI for Research Discovery and SummarizationComments: 29 pages, 9 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: AI and generative AI tools, including chatbots like ChatGPT that rely on large language models (LLMs), have burst onto the scene this year, creating incredible opportunities to increase work productivity and improve our lives. Statisticians and data scientists have begun experiencing the benefits from the availability of these tools in numerous ways, such as the generation of programming code from text prompts to analyze data or fit statistical models. One area that these tools can make a substantial impact is in research discovery and summarization. Standalone tools and plugins to chatbots are being developed that allow researchers to more quickly find relevant literature than pre-2023 search tools. Furthermore, generative AI tools have improved to the point where they can summarize and extract the key points from research articles in succinct language. Finally, chatbots based on highly parameterized LLMs can be used to simulate abductive reasoning, which provides researchers the ability to make connections among related technical topics, which can also be used for research discovery. We review the developments in AI and generative AI for research discovery and summarization, and propose directions where these types of tools are likely to head in the future that may be of interest to statistician and data scientists.
- [816] arXiv:2401.06796 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AI Hallucinations: A Misnomer Worth ClarifyingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: As large language models continue to advance in Artificial Intelligence (AI), text generation systems have been shown to suffer from a problematic phenomenon termed often as "hallucination." However, with AI's increasing presence across various domains including medicine, concerns have arisen regarding the use of the term itself. In this study, we conducted a systematic review to identify papers defining "AI hallucination" across fourteen databases. We present and analyze definitions obtained across all databases, categorize them based on their applications, and extract key points within each category. Our results highlight a lack of consistency in how the term is used, but also help identify several alternative terms in the literature. We discuss implications of these and call for a more unified effort to bring consistency to an important contemporary AI issue that can affect multiple domains significantly.
- [817] arXiv:2401.06800 (cross-list from cs.CL) [ pdf , other ]
-
Title: Reinforcement Learning for Optimizing RAG for Domain ChatbotsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: With the advent of Large Language Models (LLM), conversational assistants have become prevalent for domain use cases. LLMs acquire the ability to contextual question answering through training, and Retrieval Augmented Generation (RAG) further enables the bot to answer domain-specific questions. This paper describes a RAG-based approach for building a chatbot that answers user's queries using Frequently Asked Questions (FAQ) data. We train an in-house retrieval embedding model using infoNCE loss, and experimental results demonstrate that the in-house model works significantly better than the well-known general-purpose public embedding model, both in terms of retrieval accuracy and Out-of-Domain (OOD) query detection. As an LLM, we use an open API-based paid ChatGPT model. We noticed that a previously retrieved-context could be used to generate an answer for specific patterns/sequences of queries (e.g., follow-up queries). Hence, there is a scope to optimize the number of LLM tokens and cost. Assuming a fixed retrieval model and an LLM, we optimize the number of LLM tokens using Reinforcement Learning (RL). Specifically, we propose a policy-based model external to the RAG, which interacts with the RAG pipeline through policy actions and updates the policy to optimize the cost. The policy model can perform two actions: to fetch FAQ context or skip retrieval. We use the open API-based GPT-4 as the reward model. We then train a policy model using policy gradient on multiple training chat sessions. As a policy model, we experimented with a public gpt-2 model and an in-house BERT model. With the proposed RL-based optimization combined with similarity threshold, we are able to achieve significant cost savings while getting a slightly improved accuracy. Though we demonstrate results for the FAQ chatbot, the proposed RL approach is generic and can be experimented with any existing RAG pipeline.
- [818] arXiv:2401.06804 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: ChatGPT, Let us Chat Sign Language: Experiments, Architectural Elements, Challenges and Research DirectionsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: ChatGPT is a language model based on Generative AI. Existing research work on ChatGPT focused on its use in various domains. However, its potential for Sign Language Translation (SLT) is yet to be explored. This paper addresses this void. Therefore, we present GPT's evolution aiming a retrospective analysis of the improvements to its architecture for SLT. We explore ChatGPT's capabilities in translating different sign languages in paving the way to better accessibility for deaf and hard-of-hearing community. Our experimental results indicate that ChatGPT can accurately translate from English to American (ASL), Australian (AUSLAN), and British (BSL) sign languages and from Arabic Sign Language (ArSL) to English with only one prompt iteration. However, the model failed to translate from Arabic to ArSL and ASL, AUSLAN, and BSL to Arabic. Consequently, we present challenges and derive insights for future research directions.
- [819] arXiv:2401.06805 (cross-list from cs.CL) [ pdf , other ]
-
Title: Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal ReasoningAuthors: Yiqi Wang , Wentao Chen , Xiaotian Han , Xudong Lin , Haiteng Zhao , Yongfei Liu , Bohan Zhai , Jianbo Yuan , Quanzeng You , Hongxia YangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along with the emerging field of Multimodal Large Language Models (MLLMs), have demonstrated impressive capabilities across a wide range of multimodal tasks and applications. Particularly, various MLLMs, each with distinct model architectures, training data, and training stages, have been evaluated across a broad range of MLLM benchmarks. These studies have, to varying degrees, revealed different aspects of the current capabilities of MLLMs. However, the reasoning abilities of MLLMs have not been systematically investigated. In this survey, we comprehensively review the existing evaluation protocols of multimodal reasoning, categorize and illustrate the frontiers of MLLMs, introduce recent trends in applications of MLLMs on reasoning-intensive tasks, and finally discuss current practices and future directions. We believe our survey establishes a solid base and sheds light on this important topic, multimodal reasoning.
- [820] arXiv:2401.06806 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: AugSumm: towards generalizable speech summarization using synthetic labels from large language modelComments: This work has been submitted to the IEEE ICASSP for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. 5 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech. Given variations in information captured and phrasing, recordings can be summarized in multiple ways. Therefore, it is more reasonable to consider a probabilistic distribution of all potential summaries rather than a single summary. However, conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary for every recording. Generating multiple human references would be ideal to better represent the distribution statistically, but is impractical because annotation is expensive. We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation. First, we explore prompting strategies to generate synthetic summaries from ChatGPT. We validate the quality of synthetic summaries using multiple metrics including human evaluation, where we find that summaries generated using AugSumm are perceived as more valid to humans. Second, we develop methods to utilize synthetic summaries in training and evaluation. Experiments on How2 demonstrate that pre-training on synthetic summaries and fine-tuning on GT summaries improves ROUGE-L by 1 point on both GT and AugSumm-based test sets. AugSumm summaries are available at this https URL .
- [821] arXiv:2401.06807 (cross-list from cs.CL) [ pdf , other ]
-
Title: An EcoSage Assistant: Towards Building A Multimodal Plant Care Dialogue AssistantSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In recent times, there has been an increasing awareness about imminent environmental challenges, resulting in people showing a stronger dedication to taking care of the environment and nurturing green life. The current $19.6 billion indoor gardening industry, reflective of this growing sentiment, not only signifies a monetary value but also speaks of a profound human desire to reconnect with the natural world. However, several recent surveys cast a revealing light on the fate of plants within our care, with more than half succumbing primarily due to the silent menace of improper care. Thus, the need for accessible expertise capable of assisting and guiding individuals through the intricacies of plant care has become paramount more than ever. In this work, we make the very first attempt at building a plant care assistant, which aims to assist people with plant(-ing) concerns through conversations. We propose a plant care conversational dataset named Plantational, which contains around 1K dialogues between users and plant care experts. Our end-to-end proposed approach is two-fold : (i) We first benchmark the dataset with the help of various large language models (LLMs) and visual language model (VLM) by studying the impact of instruction tuning (zero-shot and few-shot prompting) and fine-tuning techniques on this task; (ii) finally, we build EcoSage, a multi-modal plant care assisting dialogue generation framework, incorporating an adapter-based modality infusion using a gated mechanism. We performed an extensive examination (both automated and manual evaluation) of the performance exhibited by various LLMs and VLM in the generation of the domain-specific dialogue responses to underscore the respective strengths and weaknesses of these diverse models.
- [822] arXiv:2401.06808 (cross-list from cs.CL) [ pdf , other ]
-
Title: Grounded learning for compositional vector semanticsAuthors: Martha LewisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: Categorical compositional distributional semantics is an approach to modelling language that combines the success of vector-based models of meaning with the compositional power of formal semantics. However, this approach was developed without an eye to cognitive plausibility. Vector representations of concepts and concept binding are also of interest in cognitive science, and have been proposed as a way of representing concepts within a biologically plausible spiking neural network. This work proposes a way for compositional distributional semantics to be implemented within a spiking neural network architecture, with the potential to address problems in concept binding, and give a small implementation. We also describe a means of training word representations using labelled images.
- [823] arXiv:2401.06811 (cross-list from cs.IR) [ pdf , other ]
-
Title: UniRQR: A Unified Model for Retrieval Decision, Query, and Response Generation in Internet-Based Knowledge Dialogue SystemsSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Knowledge-based dialogue systems with internet retrieval have recently attracted considerable attention from researchers. The dialogue systems overcome a major limitation of traditional knowledge dialogue systems, where the timeliness of knowledge cannot be assured, hence providing greater practical application value. Knowledge-based dialogue systems with internet retrieval can be typically segmented into three tasks: Retrieval Decision, Query Generation, and Response Generation. However, many of studies assumed that all conversations require external knowledge to continue, neglecting the critical step of determining when retrieval is necessary. This assumption often leads to an over-dependence on external knowledge, even when it may not be required. Our work addresses this oversight by employing a single unified model facilitated by prompt and multi-task learning approaches. This model not only decides whether retrieval is necessary but also generates retrieval queries and responses. By integrating these functions, our system leverages the full potential of pre-trained models and reduces the complexity and costs associated with deploying multiple models. We conducted extensive experiments to investigate the mutual enhancement among the three tasks in our system. What is more, the experiment results on the Wizint and Dusinc datasets not only demonstrate that our unified model surpasses the baseline performance for individual tasks, but also reveal that it achieves comparable results when contrasted with SOTA systems that deploy separate, specialized models for each task.
- [824] arXiv:2401.06816 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: When ChatGPT is gone: Creativity reverts and homogeneity persistsComments: 30pages,6figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: ChatGPT has been evidenced to enhance human performance in creative tasks. Yet, it is still unclear if this boosting effect sustains with and without ChatGPT. In a pre-registered seven-day lab experiment and a follow-up survey after 30 days of experiment completion, we examined the impacts of ChatGPT presence and absence on sustained creativity using a text dataset of 3302 creative ideas and 427 creative solutions from 61 college students. Participants in the treatment group used ChatGPT in creative tasks, while those in the control group completed the tasks by themselves. The findings show that although the boosting effect of ChatGPT was consistently observed over a five-day creative journey, human creative performance reverted to baseline when ChatGPT was down on the 7th and the 30th day. More critically, the use of ChatGPT in creative tasks resulted in increasingly homogenized contents, and this homogenization effect persisted even when ChatGPT was absence. These findings pose a challenge to the prevailing argument that ChatGPT can enhance human creativity. In fact, generative AI like ChatGPT lends to human with a temporary rise in creative performance but boxes human creative capability in the long run, highlighting the imperative for cautious generative AI integration in creative endeavors.
- [825] arXiv:2401.06821 (cross-list from cs.LG) [ pdf , other ]
-
Title: Surrogate Neural Networks Local Stability for Aircraft Predictive MaintenanceAuthors: Mélanie Ducoffe , Guillaume Povéda , Audrey Galametz , Ryma Boumazouza , Marion-Cécile Martin , Julien Baris , Derk Daverschot , Eugene O'HigginsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Surrogate Neural Networks (NN) now routinely serve as substitutes for computationally demanding simulations (e.g., finite element). They enable faster analyses in industrial applications e.g., manufacturing processes, performance assessment. The verification of surrogate models is a critical step to assess their robustness under different scenarios. We explore the combination of empirical and formal methods in one NN verification pipeline. We showcase its efficiency on an industrial use case of aircraft predictive maintenance. We assess the local stability of surrogate NN designed to predict the stress sustained by an aircraft part from external loads. Our contribution lies in the complete verification of the surrogate models that possess a high-dimensional input and output space, thus accommodating multi-objective constraints. We also demonstrate the pipeline effectiveness in substantially decreasing the runtime needed to assess the targeted property.
- [826] arXiv:2401.06824 (cross-list from cs.CL) [ pdf , other ]
-
Title: Open the Pandora's Box of LLMs: Jailbreaking LLMs through Representation EngineeringAuthors: Tianlong Li , Shihan Dou , Wenhao Liu , Muling Wu , Changze Lv , Xiaoqing Zheng , Xuanjing HuangComments: 13 pages, 9 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Jailbreaking techniques aim to probe the boundaries of safety in large language models (LLMs) by inducing them to generate toxic responses to malicious queries, a significant concern within the LLM community. While existing jailbreaking methods primarily rely on prompt engineering, altering inputs to evade LLM safety mechanisms, they suffer from low attack success rates and significant time overheads, rendering them inflexible. To overcome these limitations, we propose a novel jailbreaking approach, named Jailbreaking LLMs through Representation Engineering (JRE). Our method requires only a small number of query pairs to extract ``safety patterns'' that can be used to circumvent the target model's defenses, achieving unprecedented jailbreaking performance. Building upon these findings, we also introduce a novel defense framework inspired by JRE principles, which demonstrates notable effectiveness. Extensive experimentation confirms the superior performance of the JRE attacks and the robustness of the JRE defense framework. We hope this study contributes to advancing the understanding of model safety issues through the lens of representation engineering.
- [827] arXiv:2401.06826 (cross-list from cs.LG) [ pdf , other ]
-
Title: Direct Distillation between Different DomainsAuthors: Jialiang Tang , Shuo Chen , Gang Niu , Hongyuan Zhu , Joey Tianyi Zhou , Chen Gong , Masashi SugiyamaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Knowledge Distillation (KD) aims to learn a compact student network using knowledge from a large pre-trained teacher network, where both networks are trained on data from the same distribution. However, in practical applications, the student network may be required to perform in a new scenario (i.e., the target domain), which usually exhibits significant differences from the known scenario of the teacher network (i.e., the source domain). The traditional domain adaptation techniques can be integrated with KD in a two-stage process to bridge the domain gap, but the ultimate reliability of two-stage approaches tends to be limited due to the high computational consumption and the additional errors accumulated from both stages. To solve this problem, we propose a new one-stage method dubbed ``Direct Distillation between Different Domains" (4Ds). We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge. Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data. As a result, the teacher network can effectively transfer categorical knowledge that aligns with the target domain of the student network. Intensive experiments on various benchmark datasets demonstrate that our proposed 4Ds method successfully produces reliable student networks and outperforms state-of-the-art approaches.
- [828] arXiv:2401.06827 (cross-list from cs.CV) [ pdf , other ]
-
Title: APLe: Token-Wise Adaptive for Multi-Modal Prompt LearningComments: 7 pages,3 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Pre-trained Vision-Language (V-L) models set the benchmark for generalization to downstream tasks among the noteworthy contenders. Many characteristics of the V-L model have been explored in existing research including the challenge of the sensitivity to text input and the tuning process across multi-modal prompts. With the advanced utilization of the V-L model like CLIP, recent approaches deploy learnable prompts instead of hand-craft prompts to boost the generalization performance and address the aforementioned challenges. Inspired by layer-wise training, which is wildly used in image fusion, we note that using a sequential training process to adapt different modalities branches of CLIP efficiently facilitates the improvement of generalization. In the context of addressing the multi-modal prompting challenge, we propose Token-wise Adaptive for Multi-modal Prompt Learning (APLe) for tuning both modalities prompts, vision and language, as tokens in a sequential manner. APLe addresses the challenges in V-L models to promote prompt learning across both modalities, which indicates a competitive generalization performance in line with the state-of-the-art. Preeminently, APLe shows robustness and favourable performance in prompt-length experiments with an absolute advantage in adopting the V-L models.
- [829] arXiv:2401.06829 (cross-list from cs.CL) [ pdf , other ]
-
Title: Cross-Attention Watermarking of Large Language ModelsComments: 5 pages, 3 figures. Accepted to ICASSP 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: A new approach to linguistic watermarking of language models is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pretrained model. Exploration of different training strategies for optimizing the watermarking and of the challenges and implications of applying this approach in real-world scenarios clarified the tradeoff between watermark robustness and text quality. Watermark selection substantially affects the generated output for high entropy sentences. This proactive watermarking approach has potential application in future model development.
- [830] arXiv:2401.06830 (cross-list from cs.IR) [ pdf , other ]
-
Title: RecSys Challenge 2023: From data preparation to prediction, a simple, efficient, robust and scalable solutionSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The RecSys Challenge 2023, presented by ShareChat, consists to predict if an user will install an application on his smartphone after having seen advertising impressions in ShareChat & Moj apps. This paper presents the solution of 'Team UMONS' to this challenge, giving accurate results (our best score is 6.622686) with a relatively small model that can be easily implemented in different production configurations. Our solution scales well when increasing the dataset size and can be used with datasets containing missing values.
- [831] arXiv:2401.06831 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: A Survey on the Applications of Frontier AI, Foundation Models, and Large Language Models to Intelligent Transportation SystemsComments: This paper appears in International Conference on Computer and Applications (ICCA) 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This survey paper explores the transformative influence of frontier AI, foundation models, and Large Language Models (LLMs) in the realm of Intelligent Transportation Systems (ITS), emphasizing their integral role in advancing transportation intelligence, optimizing traffic management, and contributing to the realization of smart cities. Frontier AI refers to the forefront of AI technology, encompassing the latest advancements, innovations, and experimental techniques in the field, especially AI foundation models and LLMs. Foundation models, like GPT-4, are large, general-purpose AI models that provide a base for a wide range of applications. They are characterized by their versatility and scalability. LLMs are obtained from finetuning foundation models with a specific focus on processing and generating natural language. They excel in tasks like language understanding, text generation, translation, and summarization. By leveraging vast textual data, including traffic reports and social media interactions, LLMs extract critical insights, fostering the evolution of ITS. The survey navigates the dynamic synergy between LLMs and ITS, delving into applications in traffic management, integration into autonomous vehicles, and their role in shaping smart cities. It provides insights into ongoing research, innovations, and emerging trends, aiming to inspire collaboration at the intersection of language, intelligence, and mobility for safer, more efficient, and sustainable transportation systems. The paper further surveys interactions between LLMs and various aspects of ITS, exploring roles in traffic management, facilitating autonomous vehicles, and contributing to smart city development, while addressing challenges brought by frontier AI and foundation models. This paper offers valuable inspiration for future research and innovation in the transformative domain of intelligent transportation.
- [832] arXiv:2401.06833 (cross-list from cs.SY) [ pdf , other ]
-
Title: A hierarchical control framework for autonomous decision-making systems: Integrating HMDP and MPCComments: 11 pages, 14 figures, submitted to AutomaticaSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: This paper proposes a comprehensive hierarchical control framework for autonomous decision-making arising in robotics and autonomous systems. In a typical hierarchical control architecture, high-level decision making is often characterised by discrete state and decision/control sets. However, a rational decision is usually affected by not only the discrete states of the autonomous system, but also the underlying continuous dynamics even the evolution of its operational environment. This paper proposes a holistic and comprehensive design process and framework for this type of challenging problems, from new modelling and design problem formulation to control design and stability analysis. It addresses the intricate interplay between traditional continuous systems dynamics utilized at the low levels for control design and discrete Markov decision processes (MDP) for facilitating high-level decision making. We model the decision making system in complex environments as a hybrid system consisting of a controlled MDP and autonomous (i.e. uncontrolled) continuous dynamics. Consequently, the new formulation is called as hybrid Markov decision process (HMDP). The design problem is formulated with a focus on ensuring both safety and optimality while taking into account the influence of both the discrete and continuous state variables of different levels. With the help of the model predictive control (MPC) concept, a decision maker design scheme is proposed for the proposed hybrid decision making model. By carefully designing key ingredients involved in this scheme, it is shown that the recursive feasibility and stability of the proposed autonomous decision making scheme are guaranteed. The proposed framework is applied to develop an autonomous lane changing system for intelligent vehicles.
- [833] arXiv:2401.06836 (cross-list from cs.CL) [ pdf , other ]
-
Title: Enhancing Emotional Generation Capability of Large Language Models via Emotional Chain-of-ThoughtSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have shown remarkable performance in various emotion recognition tasks, thereby piquing the research community's curiosity for exploring their potential in emotional intelligence. However, several issues in the field of emotional generation tasks remain unresolved, including human preference alignment and emotional generation assessment. In this paper, we propose the Emotional Chain-of-Thought (ECoT), a plug-and-play prompting method that enhances the performance of LLMs on various emotional generation tasks by aligning with human emotional intelligence guidelines. To assess the reliability of ECoT, we propose an automated model-based evaluation method called Emotional Generation Score (EGS). EGS incorporates Goleman's Emotional Intelligence Theory as a consensus of human experts, providing a new perspective on the evaluation of emotional generation tasks. Extensive experimental results demonstrate the effectiveness of ECoT and EGS. Further, we discuss the promise of LLMs in the field of emotional intelligence and present key insights into the LLMs with the ECoT in emotional generation tasks.
- [834] arXiv:2401.06837 (cross-list from cs.CL) [ pdf , other ]
-
Title: Structsum Generation for Faster Text ComprehensionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We consider the task of generating structured representations of text using large language models (LLMs). We focus on tables and mind maps as representative modalities. Tables are more organized way of representing data, while mind maps provide a visually dynamic and flexible approach, particularly suitable for sparse content. Despite the effectiveness of LLMs on different tasks, we show that current models struggle with generating structured outputs. In response, we present effective prompting strategies for both of these tasks. We introduce a taxonomy of problems around factuality, global and local structure, common to both modalities and propose a set of critiques to tackle these issues resulting in an absolute improvement in accuracy of +37pp (79%) for mind maps and +15pp (78%) for tables. To evaluate semantic coverage of generated structured representations we propose Auto-QA, and we verify the adequacy of Auto-QA using SQuAD dataset. We further evaluate the usefulness of structured representations via a text comprehension user study. The results show a significant reduction in comprehension time compared to text when using table (42.9%) and mind map (31.9%), without loss in accuracy.
- [835] arXiv:2401.06866 (cross-list from cs.CL) [ pdf , other ]
-
Title: Health-LLM: Large Language Models for Health Prediction via Wearable Sensor DataSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is important. This paper investigates the capacity of LLMs to deliver multi-modal health predictions based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of eight state-of-the-art LLMs with diverse prompting and fine-tuning techniques on six public health datasets (PM-Data, LifeSnaps, GLOBEM, AW_FB, MIT-BIH & MIMIC-III). Our experiments cover thirteen consumer health prediction tasks in mental health, activity, metabolic, sleep, and cardiac assessment. Our fine-tuned model, Health-Alpaca exhibits comparable performance to larger models (GPT-3.5 and GPT-4), achieving the best performance in 5 out of 13 tasks. Ablation studies highlight the effectiveness of context enhancement strategies, and generalization capability of the fine-tuned models across training datasets and the size of training samples. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance.
- [836] arXiv:2401.06868 (cross-list from cs.LG) [ pdf , other ]
-
Title: Multicriteria decision support employing adaptive prediction in a tensor-based feature representationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Multicriteria decision analysis (MCDA) is a widely used tool to support decisions in which a set of alternatives should be ranked or classified based on multiple criteria. Recent studies in MCDA have shown the relevance of considering not only current evaluations of each criterion but also past data. Past-data-based approaches carry new challenges, especially in time-varying environments. This study deals with this challenge via essential tools of signal processing, such as tensorial representations and adaptive prediction. More specifically, we structure the criteria' past data as a tensor and, by applying adaptive prediction, we compose signals with these prediction values of the criteria. Besides, we transform the prediction in the time domain into a most favorable decision making domain, called the feature domain. We present a novel extension of the MCDA method PROMETHEE II, aimed at addressing the tensor in the feature domain to obtain a ranking of alternatives. Numerical experiments were performed using real-world time series, and our approach is compared with other existing strategies. The results highlight the relevance and efficiency of our proposal, especially for nonstationary time series.
- [837] arXiv:2401.06883 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Scaling While Privacy Preserving: A Comprehensive Synthetic Tabular Data Generation and Evaluation in Learning AnalyticsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Privacy poses a significant obstacle to the progress of learning analytics (LA), presenting challenges like inadequate anonymization and data misuse that current solutions struggle to address. Synthetic data emerges as a potential remedy, offering robust privacy protection. However, prior LA research on synthetic data lacks thorough evaluation, essential for assessing the delicate balance between privacy and data utility. Synthetic data must not only enhance privacy but also remain practical for data analytics. Moreover, diverse LA scenarios come with varying privacy and utility needs, making the selection of an appropriate synthetic data approach a pressing challenge. To address these gaps, we propose a comprehensive evaluation of synthetic data, which encompasses three dimensions of synthetic data quality, namely resemblance, utility, and privacy. We apply this evaluation to three distinct LA datasets, using three different synthetic data generation methods. Our results show that synthetic data can maintain similar utility (i.e., predictive performance) as real data, while preserving privacy. Furthermore, considering different privacy and data utility requirements in different LA scenarios, we make customized recommendations for synthetic data generation. This paper not only presents a comprehensive evaluation of synthetic data but also illustrates its potential in mitigating privacy concerns within the field of LA, thus contributing to a wider application of synthetic data in LA and promoting a better practice for open science.
- [838] arXiv:2401.06915 (cross-list from cs.CL) [ pdf , other ]
-
Title: DocFinQA: A Long-Context Financial Reasoning DatasetAuthors: Varshini Reddy , Rik Koncel-Kedziorski , Viet Dac Lai , Michael Krumdick , Charles Lovering , Chris TannerComments: 13 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: For large language models (LLMs) to be effective in the financial domain -- where each decision can have a significant impact -- it is necessary to investigate realistic tasks and data. Financial professionals often interact with documents that are hundreds of pages long, but most financial research datasets only deal with short excerpts from these documents. To address this, we introduce a long-document financial QA task. We augment 7,437 questions from the existing FinQA dataset with the full-document context, extending the average context length from under 700 words in FinQA to 123k words in DocFinQA. We conduct extensive experiments over retrieval-based QA pipelines and long-context language models. DocFinQA proves a significant challenge for even state-of-the-art systems. We also provide a case-study on the longest documents in DocFinQA and find that models particularly struggle on these documents. Addressing these challenges may have a wide reaching impact across applications where specificity and long-range contexts are critical, like gene sequences and legal document contract analysis.
- [839] arXiv:2401.06922 (cross-list from cs.LG) [ pdf , other ]
-
Title: Open RAN LSTM Traffic Prediction and Slice Management using Deep Reinforcement LearningComments: Accepted to publish in the IEEE Asilomar Conference on Signals, Systems, and Computers, 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY); Machine Learning (stat.ML)
Abstract: With emerging applications such as autonomous driving, smart cities, and smart factories, network slicing has become an essential component of 5G and beyond networks as a means of catering to a service-aware network. However, managing different network slices while maintaining quality of services (QoS) is a challenge in a dynamic environment. To address this issue, this paper leverages the heterogeneous experiences of distributed units (DUs) in ORAN systems and introduces a novel approach to ORAN slicing xApp using distributed deep reinforcement learning (DDRL). Additionally, to enhance the decision-making performance of the RL agent, a prediction rApp based on long short-term memory (LSTM) is incorporated to provide additional information from the dynamic environment to the xApp. Simulation results demonstrate significant improvements in network performance, particularly in reducing QoS violations. This emphasizes the importance of using the prediction rApp and distributed actors' information jointly as part of a dynamic xApp.
- [840] arXiv:2401.06946 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: 3D Object Detection and High-Resolution Traffic Parameters Extraction Using Low-Resolution LiDAR DataComments: 19 pages, 11 figures. This paper has been submitted for consideration for presentation at the 103rd Annual Meeting of the Transportation Research Board, January 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Traffic volume data collection is a crucial aspect of transportation engineering and urban planning, as it provides vital insights into traffic patterns, congestion, and infrastructure efficiency. Traditional manual methods of traffic data collection are both time-consuming and costly. However, the emergence of modern technologies, particularly Light Detection and Ranging (LiDAR), has revolutionized the process by enabling efficient and accurate data collection. Despite the benefits of using LiDAR for traffic data collection, previous studies have identified two major limitations that have impeded its widespread adoption. These are the need for multiple LiDAR systems to obtain complete point cloud information of objects of interest, as well as the labor-intensive process of annotating 3D bounding boxes for object detection tasks. In response to these challenges, the current study proposes an innovative framework that alleviates the need for multiple LiDAR systems and simplifies the laborious 3D annotation process. To achieve this goal, the study employed a single LiDAR system, that aims at reducing the data acquisition cost and addressed its accompanying limitation of missing point cloud information by developing a Point Cloud Completion (PCC) framework to fill in missing point cloud information using point density. Furthermore, we also used zero-shot learning techniques to detect vehicles and pedestrians, as well as proposed a unique framework for extracting low to high features from the object of interest, such as height, acceleration, and speed. Using the 2D bounding box detection and extracted height information, this study is able to generate 3D bounding boxes automatically without human intervention.
- [841] arXiv:2401.06947 (cross-list from cs.CL) [ pdf , other ]
-
Title: Parameter-Efficient Detoxification with Contrastive DecodingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The field of natural language generation has witnessed significant advancements in recent years, including the development of controllable text generation techniques. However, controlling the attributes of the generated text remains a challenge, especially when aiming to avoid undesirable behavior such as toxicity. In this work, we introduce Detoxification Generator (DETOXIGEN), an inference-time algorithm that steers the generation away from unwanted styles. DETOXIGEN is an ensemble of a pre-trained language model (generator) and a detoxifier. The detoxifier is trained intentionally on the toxic data representative of the undesirable attribute, encouraging it to generate text in that style exclusively. During the actual generation, we use the trained detoxifier to produce undesirable tokens for the generator to contrast against at each decoding step. This approach directly informs the generator to avoid generating tokens that the detoxifier considers highly likely. We evaluate DETOXIGEN on the commonly used REALTOXICITYPROMPTS benchmark (Gehman et al., 2020) with various language models as generators. We find that it significantly outperforms previous approaches in detoxification metrics while not compromising on the generation quality. Moreover, the detoxifier is obtained by soft prompt-tuning using the same backbone language model as the generator. Hence, DETOXIGEN requires only a tiny amount of extra weights from the virtual tokens of the detoxifier to be loaded into GPU memory while decoding, making it a promising lightweight, practical, and parameter-efficient detoxification strategy.
- [842] arXiv:2401.06949 (cross-list from cs.RO) [ pdf , other ]
-
Title: ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and CharacterizationAuthors: Kourosh Darvish , Marta Skreta , Yuchi Zhao , Naruki Yoshikawa , Sagnik Som , Miroslav Bogdanovic , Yang Cao , Han Hao , Haoping Xu , Alán Aspuru-Guzik , Animesh Garg , Florian ShkurtiSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Chemistry experimentation is often resource- and labor-intensive. Despite the many benefits incurred by the integration of advanced and special-purpose lab equipment, many aspects of experimentation are still manually conducted by chemists, for example, polishing an electrode in electrochemistry experiments. Traditional lab automation infrastructure faces challenges when it comes to flexibly adapting to new chemistry experiments. To address this issue, we propose a human-friendly and flexible robotic system, ORGANA, that automates a diverse set of chemistry experiments. It is capable of interacting with chemists in the lab through natural language, using Large Language Models (LLMs). ORGANA keeps scientists informed by providing timely reports that incorporate statistical analyses. Additionally, it actively engages with users when necessary for disambiguation or troubleshooting. ORGANA can reason over user input to derive experiment goals, and plan long sequences of both high-level tasks and low-level robot actions while using feedback from the visual perception of the environment. It also supports scheduling and parallel execution for experiments that require resource allocation and coordination between multiple robots and experiment stations. We show that ORGANA successfully conducts a diverse set of chemistry experiments, including solubility assessment, pH measurement, recrystallization, and electrochemistry experiments. For the latter, we show that ORGANA robustly executes a long-horizon plan, comprising 19 steps executed in parallel, to characterize the electrochemical properties of quinone derivatives, a class of molecules used in rechargeable flow batteries. Our user study indicates that ORGANA significantly improves many aspects of user experience while reducing their physical workload. More details about ORGANA can be found at this https URL .
- [843] arXiv:2401.06951 (cross-list from cs.CL) [ pdf , other ]
-
Title: E^2-LLM: Efficient and Extreme Length Extension of Large Language ModelsAuthors: Jiaheng Liu , Zhiqi Bai , Yuanxing Zhang , Chenchen Zhang , Yu Zhang , Ge Zhang , Jiakai Wang , Haoran Que , Yukang Chen , Wenbo Su , Tiezheng Ge , Jie Fu , Wenhu Chen , Bo ZhengSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issues, we propose an Efficient and Extreme length extension method for Large Language Models, called E 2 -LLM, with only one training procedure and dramatically reduced computation cost, which also removes the need to collect long-context data. Concretely, first, the training data of our E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost greatly. Second, the training procedure on the short training context window is performed only once time, and we can support different evaluation context windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings, we introduce two different augmentation methods on the scale and position index parameters for different samples in training. It aims to make the model more robust to the different relative differences when directly interpolating the arbitrary context length at inference. Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on challenging long-context tasks.
- [844] arXiv:2401.06952 (cross-list from cs.LG) [ pdf , other ]
-
Title: Reinforcement Learning for Scalable Train Timetable Rescheduling with Graph RepresentationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Train timetable rescheduling (TTR) aims to promptly restore the original operation of trains after unexpected disturbances or disruptions. Currently, this work is still done manually by train dispatchers, which is challenging to maintain performance under various problem instances. To mitigate this issue, this study proposes a reinforcement learning-based approach to TTR, which makes the following contributions compared to existing work. First, we design a simple directed graph to represent the TTR problem, enabling the automatic extraction of informative states through graph neural networks. Second, we reformulate the construction process of TTR's solution, not only decoupling the decision model from the problem size but also ensuring the generated scheme's feasibility. Third, we design a learning curriculum for our model to handle the scenarios with different levels of delay. Finally, a simple local search method is proposed to assist the learned decision model, which can significantly improve solution quality with little additional computation cost, further enhancing the practical value of our method. Extensive experimental results demonstrate the effectiveness of our method. The learned decision model can achieve better performance for various problems with varying degrees of train delay and different scales when compared to handcrafted rules and state-of-the-art solvers.
- [845] arXiv:2401.06960 (cross-list from cs.CV) [ pdf , other ]
-
Title: Transformer for Object Re-Identification: A SurveySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Object Re-Identification (Re-ID) aims to identify and retrieve specific objects from varying viewpoints. For a prolonged period, this field has been predominantly driven by deep convolutional neural networks. In recent years, the Transformer has witnessed remarkable advancements in computer vision, prompting an increasing body of research to delve into the application of Transformer in Re-ID. This paper provides a comprehensive review and in-depth analysis of the Transformer-based Re-ID. In categorizing existing works into Image/Video-Based Re-ID, Re-ID with limited data/annotations, Cross-Modal Re-ID, and Special Re-ID Scenarios, we thoroughly elucidate the advantages demonstrated by the Transformer in addressing a multitude of challenges across these domains. Considering the trending unsupervised Re-ID, we propose a new Transformer baseline, UntransReID, achieving state-of-the-art performance on both single-/cross modal tasks. Besides, this survey also covers a wide range of Re-ID research objects, including progress in animal Re-ID. Given the diversity of species in animal Re-ID, we devise a standardized experimental benchmark and conduct extensive experiments to explore the applicability of Transformer for this task to facilitate future research. Finally, we discuss some important yet under-investigated open issues in the big foundation model era, we believe it will serve as a new handbook for researchers in this field.
- [846] arXiv:2401.06961 (cross-list from cs.CL) [ pdf , other ]
-
Title: CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning CapabilitiesComments: Project website at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent large language models (LLMs) have shown indications of mathematical reasoning ability. However it has not been clear how they would fare on more challenging competition-level problems. And while self-generated verbalizations of intermediate reasoning steps (i.e., chain-of-thought prompting) have been shown to be helpful, whether LLMs can make use of helpful side information such as problem-specific hints has not been investigated before. In this paper, we propose a challenging benchmark dataset for enabling such analyses. The Concept and Hint-Annotated Math Problems (CHAMP) consists of high school math competition problems, annotated with concepts, or general math facts, and hints, or problem-specific tricks. These annotations allow us to explore the effects of additional information, such as relevant hints, misleading concepts, or related problems. This benchmark is difficult, with the best model only scoring 58.1% in standard settings. With concepts and hints, performance sometimes improves, indicating that some models can make use of such side information. We further annotate model-generated solutions for their correctness. Using this corpus, we find that models often arrive at the correct final answer through wrong reasoning steps. In addition, we test whether models are able to verify these solutions, and find that most models struggle. The dataset and code are available on the project website.
- [847] arXiv:2401.06977 (cross-list from cs.RO) [ pdf , other ]
-
Title: Singing the Body Electric: The Impact of Robot Embodiment on User ExpectationsComments: Presented at the RSS Workshop on Social Intelligence in Humans and Robots, 2023Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Users develop mental models of robots to conceptualize what kind of interactions they can have with those robots. The conceptualizations are often formed before interactions with the robot and are based only on observing the robot's physical design. As a result, understanding conceptualizations formed from physical design is necessary to understand how users intend to interact with the robot. We propose to use multimodal features of robot embodiments to predict what kinds of expectations users will have about a given robot's social and physical capabilities. We show that using such features provides information about general mental models of the robots that generalize across socially interactive robots. We describe how these models can be incorporated into interaction design and physical design for researchers working with socially interactive robots.
- [848] arXiv:2401.06992 (cross-list from cs.CV) [ pdf , other ]
-
Title: Progressive Feature Fusion Network for Enhancing Image Quality AssessmentComments: Data Compression ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Image compression has been applied in the fields of image storage and video broadcasting. However, it's formidably tough to distinguish the subtle quality differences between those distorted images generated by different algorithms. In this paper, we propose a new image quality assessment framework to decide which image is better in an image group. To capture the subtle differences, a fine-grained network is adopted to acquire multi-scale features. Subsequently, we design a cross subtract block for separating and gathering the information within positive and negative image pairs. Enabling image comparison in feature space. After that, a progressive feature fusion block is designed, which fuses multi-scale features in a novel progressive way. Hierarchical spatial 2D features can thus be processed gradually. Experimental results show that compared with the current mainstream image quality assessment methods, the proposed network can achieve more accurate image quality assessment and ranks second in the benchmark of CLIC in the image perceptual model track.
- [849] arXiv:2401.07009 (cross-list from cs.CL) [ pdf , other ]
-
Title: Joint Extraction of Uyghur Medicine Knowledge with Edge ComputingComments: 11 pages,6 figures,Has been accepted by Tsinghua Science and TechnologySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Medical knowledge extraction methods based on edge computing deploy deep learning models on edge devices to achieve localized entity and relation extraction. This approach avoids transferring substantial sensitive data to cloud data centers, effectively safeguarding the privacy of healthcare services. However, existing relation extraction methods mainly employ a sequential pipeline approach, which classifies relations between determined entities after entity recognition. This mode faces challenges such as error propagation between tasks, insufficient consideration of dependencies between the two subtasks, and the neglect of interrelations between different relations within a sentence. To address these challenges, a joint extraction model with parameter sharing in edge computing is proposed, named CoEx-Bert. This model leverages shared parameterization between two models to jointly extract entities and relations. Specifically, CoEx-Bert employs two models, each separately sharing hidden layer parameters, and combines these two loss functions for joint backpropagation to optimize the model parameters. Additionally, it effectively resolves the issue of entity overlapping when extracting knowledge from unstructured Uyghur medical texts by considering contextual relations. Finally, this model is deployed on edge devices for real-time extraction and inference of Uyghur medical knowledge. Experimental results demonstrate that CoEx-Bert outperforms existing state-of-the-art methods, achieving accuracy, recall, and F1 scores of 90.65\%, 92.45\%, and 91.54\%, respectively, in the Uyghur traditional medical literature dataset. These improvements represent a 6.45\% increase in accuracy, a 9.45\% increase in recall, and a 7.95\% increase in F1 score compared to the baseline.
- [850] arXiv:2401.07014 (cross-list from cs.CV) [ pdf , other ]
-
Title: Weak Labeling for Cropland Mapping in AfricaAuthors: Gilles Quentin Hacheme , Akram Zaytar , Girmaw Abebe Tadesse , Caleb Robinson , Rahul Dodhia , Juan M. Lavista Ferres , Stephen WoodComments: 5 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Cropland mapping can play a vital role in addressing environmental, agricultural, and food security challenges. However, in the context of Africa, practical applications are often hindered by the limited availability of high-resolution cropland maps. Such maps typically require extensive human labeling, thereby creating a scalability bottleneck. To address this, we propose an approach that utilizes unsupervised object clustering to refine existing weak labels, such as those obtained from global cropland maps. The refined labels, in conjunction with sparse human annotations, serve as training data for a semantic segmentation network designed to identify cropland areas. We conduct experiments to demonstrate the benefits of the improved weak labels generated by our method. In a scenario where we train our model with only 33 human-annotated labels, the F_1 score for the cropland category increases from 0.53 to 0.84 when we add the mined negative labels.
- [851] arXiv:2401.07022 (cross-list from cs.LG) [ pdf , other ]
-
Title: Edge-Enabled Anomaly Detection and Information Completion for Social Network Knowledge GraphsComments: 20 pages, 6 figures, Has been accepted by Wireless NetworkSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In the rapidly advancing information era, various human behaviors are being precisely recorded in the form of data, including identity information, criminal records, and communication data. Law enforcement agencies can effectively maintain social security and precisely combat criminal activities by analyzing the aforementioned data. In comparison to traditional data analysis methods, deep learning models, relying on the robust computational power in cloud centers, exhibit higher accuracy in extracting data features and inferring data. However, within the architecture of cloud centers, the transmission of data from end devices introduces significant latency, hindering real-time inference of data. Furthermore, low-latency edge computing architectures face limitations in direct deployment due to relatively weak computing and storage capacities of nodes. To address these challenges, a lightweight distributed knowledge graph completion architecture is proposed. Firstly, we introduce a lightweight distributed knowledge graph completion architecture that utilizes knowledge graph embedding for data analysis. Subsequently, to filter out substandard data, a personnel data quality assessment method named PDQA is proposed. Lastly, we present a model pruning algorithm that significantly reduces the model size while maximizing performance, enabling lightweight deployment. In experiments, we compare the effects of 11 advanced models on completing the knowledge graph of public security personnel information. The results indicate that the RotatE model outperforms other models significantly in knowledge graph completion, with the pruned model size reduced by 70\%, and hits@10 reaching 86.97\%.}
- [852] arXiv:2401.07031 (cross-list from cs.CR) [ pdf , other ]
-
Title: Code Security Vulnerability Repair Using Reinforcement Learning with Large Language ModelsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: With the recent advancement of Large Language Models (LLMs), generating functionally correct code has become less complicated for a wide array of developers. While using LLMs has sped up the functional development process, it poses a heavy risk to code security. Code generation with proper security measures using LLM is a significantly more challenging task than functional code generation. Security measures may include adding a pair of lines of code with the original code, consisting of null pointer checking or prepared statements for SQL injection prevention. Currently, available code repair LLMs generate code repair by supervised fine-tuning, where the model looks at cross-entropy loss. However, the original and repaired codes are mostly similar in functionality and syntactically, except for a few (1-2) lines, which act as security measures. This imbalance between the lines needed for security measures and the functional code enforces the supervised fine-tuned model to prioritize generating functional code without adding proper security measures, which also benefits the model by resulting in minimal loss. Therefore, in this work, for security hardening and strengthening of generated code from LLMs, we propose a reinforcement learning-based method for program-specific repair with the combination of semantic and syntactic reward mechanisms that focus heavily on adding security and functional measures in the code, respectively.
- [853] arXiv:2401.07037 (cross-list from cs.CL) [ pdf , other ]
-
Title: xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought ReasoningAuthors: Linzheng Chai , Jian Yang , Tao Sun , Hongcheng Guo , Jiaheng Liu , Bing Wang , Xiannian Liang , Jiaqi Bai , Tongliang Li , Qiyao Peng , Zhoujun LiComments: 11 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in large language models and improve a variety of downstream tasks. CoT mainly demonstrates excellent performance in English, but its usage in low-resource languages is constrained due to poor language generalization. To bridge the gap among different languages, we propose a cross-lingual instruction fine-tuning framework (xCOT) to transfer knowledge from high-resource languages to low-resource languages. Specifically, the multilingual instruction training data (xCOT-INSTRUCT) is created to encourage the semantic alignment of multiple languages. We introduce cross-lingual in-context few-shot learning (xICL)) to accelerate multilingual agreement in instruction tuning, where some fragments of source languages in examples are randomly substituted by their counterpart translations of target languages. During multilingual instruction tuning, we adopt the randomly online CoT strategy to enhance the multilingual reasoning ability of the large language model by first translating the query to another language and then answering in English. To further facilitate the language transfer, we leverage the high-resource CoT to supervise the training of low-resource languages with cross-lingual distillation. Experimental results on previous benchmarks demonstrate the superior performance of xCoT in reducing the gap among different languages, highlighting its potential to reduce the cross-lingual gap.
- [854] arXiv:2401.07042 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: GEML: A Grammar-based Evolutionary Machine Learning Approach for Design-Pattern DetectionComments: 27 pages, 18 tables, 10 figures, journal paperJournal-ref: Journal of Systems and Software, Volume 175, May 2021, 110919Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Design patterns (DPs) are recognised as a good practice in software development. However, the lack of appropriate documentation often hampers traceability, and their benefits are blurred among thousands of lines of code. Automatic methods for DP detection have become relevant but are usually based on the rigid analysis of either software metrics or specific properties of the source code. We propose GEML, a novel detection approach based on evolutionary machine learning using software properties of diverse nature. Firstly, GEML makes use of an evolutionary algorithm to extract those characteristics that better describe the DP, formulated in terms of human-readable rules, whose syntax is conformant with a context-free grammar. Secondly, a rule-based classifier is built to predict whether new code contains a hidden DP implementation. GEML has been validated over five DPs taken from a public repository recurrently adopted by machine learning studies. Then, we increase this number up to 15 diverse DPs, showing its effectiveness and robustness in terms of detection capability. An initial parameter study served to tune a parameter setup whose performance guarantees the general applicability of this approach without the need to adjust complex parameters to a specific pattern. Finally, a demonstration tool is also provided.
- [855] arXiv:2401.07051 (cross-list from cs.LG) [ pdf , other ]
-
Title: COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription PolicyAuthors: Lu Wang , Mayukh Das , Fangkai Yang , Chao Duo , Bo Qiao , Hang Dong , Si Qin , Chetan Bansal , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang , Qi ZhangComments: 9 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We address the challenge of learning safe and robust decision policies in presence of uncertainty in context of the real scientific problem of adaptive resource oversubscription to enhance resource efficiency while ensuring safety against resource congestion risk.
Traditional supervised prediction or forecasting models are ineffective in learning adaptive policies whereas standard online optimization or reinforcement learning is difficult to deploy on real systems. Offline methods such as imitation learning (IL) are ideal since we can directly leverage historical resource usage telemetry. But, the underlying aleatoric uncertainty in such telemetry is a critical bottleneck.
We solve this with our proposed novel chance-constrained imitation learning framework, which ensures implicit safety against uncertainty in a principled manner via a combination of stochastic (chance) constraints on resource congestion risk and ensemble value functions. This leads to substantial ($\approx 3-4\times$) improvement in resource efficiency and safety in many oversubscription scenarios, including resource management in cloud services. - [856] arXiv:2401.07058 (cross-list from cs.HC) [ pdf , other ]
-
Title: Does More Advice Help? The Effects of Second Opinions in AI-Assisted Decision MakingSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: AI assistance in decision-making has become popular, yet people's inappropriate reliance on AI often leads to unsatisfactory human-AI collaboration performance. In this paper, through three pre-registered, randomized human subject experiments, we explore whether and how the provision of {second opinions} may affect decision-makers' behavior and performance in AI-assisted decision-making. We find that if both the AI model's decision recommendation and a second opinion are always presented together, decision-makers reduce their over-reliance on AI while increase their under-reliance on AI, regardless whether the second opinion is generated by a peer or another AI model. However, if decision-makers have the control to decide when to solicit a peer's second opinion, we find that their active solicitations of second opinions have the potential to mitigate over-reliance on AI without inducing increased under-reliance in some cases. We conclude by discussing the implications of our findings for promoting effective human-AI collaborations in decision-making.
- [857] arXiv:2401.07062 (cross-list from cs.LG) [ pdf , other ]
-
Title: Dirichlet-Based Prediction Calibration for Learning with Noisy LabelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Learning with noisy labels can significantly hinder the generalization performance of deep neural networks (DNNs). Existing approaches address this issue through loss correction or example selection methods. However, these methods often rely on the model's predictions obtained from the softmax function, which can be over-confident and unreliable. In this study, we identify the translation invariance of the softmax function as the underlying cause of this problem and propose the \textit{Dirichlet-based Prediction Calibration} (DPC) method as a solution. Our method introduces a calibrated softmax function that breaks the translation invariance by incorporating a suitable constant in the exponent term, enabling more reliable model predictions. To ensure stable model training, we leverage a Dirichlet distribution to assign probabilities to predicted labels and introduce a novel evidence deep learning (EDL) loss. The proposed loss function encourages positive and sufficiently large logits for the given label, while penalizing negative and small logits for other labels, leading to more distinct logits and facilitating better example selection based on a large-margin criterion. Through extensive experiments on diverse benchmark datasets, we demonstrate that DPC achieves state-of-the-art performance. The code is available at this https URL .
- [858] arXiv:2401.07065 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Tensor Graph Convolutional Network for Dynamic Graph Representation LearningComments: 6 pages, 3 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Dynamic graphs (DG) describe dynamic interactions between entities in many practical scenarios. Most existing DG representation learning models combine graph convolutional network and sequence neural network, which model spatial-temporal dependencies through two different types of neural networks. However, this hybrid design cannot well capture the spatial-temporal continuity of a DG. In this paper, we propose a tensor graph convolutional network to learn DG representations in one convolution framework based on the tensor product with the following two-fold ideas: a) representing the information of DG by tensor form; b) adopting tensor product to design a tensor graph convolutional network modeling spatial-temporal feature simultaneously. Experiments on real-world DG datasets demonstrate that our model obtains state-of-the-art performance.
- [859] arXiv:2401.07072 (cross-list from cs.SE) [ pdf , other ]
-
Title: InterEvo-TR: Interactive Evolutionary Test Generation With Readability AssessmentAuthors: Pedro Delgado-Pérez , Aurora Ramírez , Kevin J. Valle-Gómez , Inmaculada Medina-Bulo , José Raúl RomeroComments: 17 pages, 10 figures, 5 tables, journal paperJournal-ref: IEEE Transactions on Software Engineering (Volume: 49, Issue: 4, 01 April 2023)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Automated test case generation has proven to be useful to reduce the usually high expenses of software testing. However, several studies have also noted the skepticism of testers regarding the comprehension of generated test suites when compared to manually designed ones. This fact suggests that involving testers in the test generation process could be helpful to increase their acceptance of automatically-produced test suites. In this paper, we propose incorporating interactive readability assessments made by a tester into EvoSuite, a widely-known evolutionary test generation tool. Our approach, InterEvo-TR, interacts with the tester at different moments during the search and shows different test cases covering the same coverage target for their subjective evaluation. The design of such an interactive approach involves a schedule of interaction, a method to diversify the selected targets, a plan to save and handle the readability values, and some mechanisms to customize the level of engagement in the revision, among other aspects. To analyze the potential and practicability of our proposal, we conduct a controlled experiment in which 39 participants, including academics, professional developers, and student collaborators, interact with InterEvo-TR. Our results show that the strategy to select and present intermediate results is effective for the purpose of readability assessment. Furthermore, the participants' actions and responses to a questionnaire allowed us to analyze the aspects influencing test code readability and the benefits and limitations of an interactive approach in the context of test case generation, paving the way for future developments based on interactivity.
- [860] arXiv:2401.07085 (cross-list from cs.LG) [ pdf , other ]
-
Title: When Does Feature Learning Happen? Perspective from an Analytically Solvable ModelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We identify and solve a hidden-layer model that is analytically tractable at any finite width and whose limits exhibit both the kernel phase and the feature learning phase. We analyze the phase diagram of this model in all possible limits of common hyperparameters including width, layer-wise learning rates, scale of output, and scale of initialization. We apply our result to analyze how and when feature learning happens in both infinite and finite-width models. Three prototype mechanisms of feature learning are identified: (1) learning by alignment, (2) learning by disalignment, and (3) learning by rescaling. In sharp contrast, neither of these mechanisms is present when the model is in the kernel regime. This discovery explains why large initialization often leads to worse performance. Lastly, we empirically demonstrate that discoveries we made for this analytical model also appear in nonlinear networks in real tasks.
- [861] arXiv:2401.07102 (cross-list from cs.NE) [ pdf , other ]
-
Title: Evolving Code with A Large Language ModelComments: 34 pages, 9 figures, 6 TablesSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Algorithms that use Large Language Models (LLMs) to evolve code arrived on the Genetic Programming (GP) scene very recently. We present LLM GP, a formalized LLM-based evolutionary algorithm designed to evolve code. Like GP, it uses evolutionary operators, but its designs and implementations of those operators radically differ from GP's because they enlist an LLM, using prompting and the LLM's pre-trained pattern matching and sequence completion capability. We also present a demonstration-level variant of LLM GP and share its code. By addressing algorithms that range from the formal to hands-on, we cover design and LLM-usage considerations as well as the scientific challenges that arise when using an LLM for genetic programming.
- [862] arXiv:2401.07105 (cross-list from cs.CL) [ pdf , other ]
-
Title: Graph Language ModelsComments: 8 pages, 10 figures, 9 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: While Language Models (LMs) are the workhorses of NLP, their interplay with structured knowledge graphs (KGs) is still actively researched. Current methods for encoding such graphs typically either (i) linearize them for embedding with LMs -- which underutilize structural information, or (ii) use Graph Neural Networks (GNNs) to preserve the graph structure -- but GNNs cannot represent text features as well as pretrained LMs. In our work we introduce a novel LM type, the Graph Language Model (GLM), that integrates the strengths of both approaches and mitigates their weaknesses. The GLM parameters are initialized from a pretrained LM to enhance understanding of individual graph concepts and triplets. Simultaneously, we design the GLM's architecture to incorporate graph biases, thereby promoting effective knowledge distribution within the graph. This enables GLMs to process graphs, texts, and interleaved inputs of both. Empirical evaluations on relation classification tasks show that GLM embeddings surpass both LM- and GNN-based baselines in supervised and zero-shot setting, demonstrating their versatility.
- [863] arXiv:2401.07118 (cross-list from cs.HC) [ pdf , other ]
-
Title: Exploring of Discrete and Continuous Input Control for AI-enhanced Assistive Robotic ArmsComments: Companion of the 2024 ACM/IEEE International Conference on Human-Robot InteractionSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Robotic arms, integral in domestic care for individuals with motor impairments, enable them to perform Activities of Daily Living (ADLs) independently, reducing dependence on human caregivers. These collaborative robots require users to manage multiple Degrees-of-Freedom (DoFs) for tasks like grasping and manipulating objects. Conventional input devices, typically limited to two DoFs, necessitate frequent and complex mode switches to control individual DoFs. Modern adaptive controls with feed-forward multi-modal feedback reduce the overall task completion time, number of mode switches, and cognitive load. Despite the variety of input devices available, their effectiveness in adaptive settings with assistive robotics has yet to be thoroughly assessed. This study explores three different input devices by integrating them into an established XR framework for assistive robotics, evaluating them and providing empirical insights through a preliminary study for future developments.
- [864] arXiv:2401.07128 (cross-list from cs.CL) [ pdf , other ]
-
Title: EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health RecordsAuthors: Wenqi Shi , Ran Xu , Yuchen Zhuang , Yue Yu , Jieyu Zhang , Hang Wu , Yuanda Zhu , Joyce Ho , Carl Yang , May D. WangComments: Work in ProgressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have demonstrated exceptional capabilities in planning and tool utilization as autonomous agents, but few have been developed for medical problem-solving. We propose EHRAgent, an LLM agent empowered with a code interface, to autonomously generate and execute code for multi-tabular reasoning within electronic health records (EHRs). First, we formulate an EHR question-answering task into a tool-use planning process, efficiently decomposing a complicated task into a sequence of manageable actions. By integrating interactive coding and execution feedback, EHRAgent learns from error messages and improves the originally generated code through iterations. Furthermore, we enhance the LLM agent by incorporating long-term memory, which allows EHRAgent to effectively select and build upon the most relevant successful cases from past experiences. Experiments on three real-world multi-tabular EHR datasets show that EHRAgent outperforms the strongest baseline by up to 29.6% in success rate. EHRAgent leverages the emerging few-shot learning capabilities of LLMs, enabling autonomous code generation and execution to tackle complex clinical tasks with minimal demonstrations.
- [865] arXiv:2401.07139 (cross-list from cs.CV) [ pdf , other ]
-
Title: Deep Blind Super-Resolution for Satellite VideoComments: Published in IEEE TGRSJournal-ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1-16, 2023, Art no. 5516316Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mainly engaged in blur kernel estimation while losing sight of another critical aspect for VSR tasks: temporal compensation, especially compensating for blurry and smooth pixels with vital sharpness from severely degraded satellite videos. Therefore, this paper proposes a practical Blind SVSR algorithm (BSVSR) to explore more sharp cues by considering the pixel-wise blur levels in a coarse-to-fine manner. Specifically, we employed multi-scale deformable convolution to coarsely aggregate the temporal redundancy into adjacent frames by window-slid progressive fusion. Then the adjacent features are finely merged into mid-feature using deformable attention, which measures the blur levels of pixels and assigns more weights to the informative pixels, thus inspiring the representation of sharpness. Moreover, we devise a pyramid spatial transformation module to adjust the solution space of sharp mid-feature, resulting in flexible feature adaptation in multi-level domains. Quantitative and qualitative evaluations on both simulated and real-world satellite videos demonstrate that our BSVSR performs favorably against state-of-the-art non-blind and blind SR models. Code will be available at this https URL
- [866] arXiv:2401.07145 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Scalable and Efficient Methods for Uncertainty Estimation and Reduction in Deep LearningAuthors: Soyed Tuhin AhmedSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Neural networks (NNs) can achieved high performance in various fields such as computer vision, and natural language processing. However, deploying NNs in resource-constrained safety-critical systems has challenges due to uncertainty in the prediction caused by out-of-distribution data, and hardware non-idealities. To address the challenges of deploying NNs in resource-constrained safety-critical systems, this paper summarizes the (4th year) PhD thesis work that explores scalable and efficient methods for uncertainty estimation and reduction in deep learning, with a focus on Computation-in-Memory (CIM) using emerging resistive non-volatile memories. We tackle the inherent uncertainties arising from out-of-distribution inputs and hardware non-idealities, crucial in maintaining functional safety in automated decision-making systems. Our approach encompasses problem-aware training algorithms, novel NN topologies, and hardware co-design solutions, including dropout-based \emph{binary} Bayesian Neural Networks leveraging spintronic devices and variational inference techniques. These innovations significantly enhance OOD data detection, inference accuracy, and energy efficiency, thereby contributing to the reliability and robustness of NN implementations.
- [867] arXiv:2401.07159 (cross-list from cs.LG) [ pdf , other ]
-
Title: Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language ModelsAuthors: Zhengxin Zhang , Dan Zhao , Xupeng Miao , Gabriele Oliaro , Qing Li , Yong Jiang , Zhihao JiaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Finetuning large language models (LLMs) has been empirically effective on a variety of downstream tasks. Existing approaches to finetuning an LLM either focus on parameter-efficient finetuning, which only updates a small number of trainable parameters, or attempt to reduce the memory footprint during the training phase of the finetuning. Typically, the memory footprint during finetuning stems from three contributors: model weights, optimizer states, and intermediate activations. However, existing works still require considerable memory and none can simultaneously mitigate memory footprint for all three sources. In this paper, we present Quantized Side Tuing (QST), which enables memory-efficient and fast finetuning of LLMs by operating through a dual-stage process. First, QST quantizes an LLM's model weights into 4-bit to reduce the memory footprint of the LLM's original weights; QST also introduces a side network separated from the LLM, which utilizes the hidden states of the LLM to make task-specific predictions. Using a separate side network avoids performing backpropagation through the LLM, thus reducing the memory requirement of the intermediate activations. Furthermore, QST leverages several low-rank adaptors and gradient-free downsample modules to significantly reduce the trainable parameters, so as to save the memory footprint of the optimizer states. Experiments show that QST can reduce the total memory footprint by up to 2.3 $\times$ and speed up the finetuning process by up to 3 $\times$ while achieving competent performance compared with the state-of-the-art. When it comes to full finetuning, QST can reduce the total memory footprint up to 7 $\times$.
- [868] arXiv:2401.07179 (cross-list from cs.CE) [ pdf , other ]
-
Title: Forecasting GDP in Europe with Textual DataComments: 34 pages, 6 figures, published in Journal of Applied Econometrics (Early view)Subjects: Computational Engineering, Finance, and Science (cs.CE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We evaluate the informational content of news-based sentiment indicators for forecasting Gross Domestic Product (GDP) and other macroeconomic variables of the five major European economies. Our data set includes over 27 million articles for 26 major newspapers in 5 different languages. The evidence indicates that these sentiment indicators are significant predictors to forecast macroeconomic variables and their predictive content is robust to controlling for other indicators available to forecasters in real-time.
- [869] arXiv:2401.07220 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Application of 2D Homography for High Resolution Traffic Data Collection using CCTV CamerasComments: 25 pages, 9 figures, this paper was submitted for consideration for presentation at the 102nd Annual Meeting of the Transportation Research Board, January 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Traffic cameras remain the primary source data for surveillance activities such as congestion and incident monitoring. To date, State agencies continue to rely on manual effort to extract data from networked cameras due to limitations of the current automatic vision systems including requirements for complex camera calibration and inability to generate high resolution data. This study implements a three-stage video analytics framework for extracting high-resolution traffic data such vehicle counts, speed, and acceleration from infrastructure-mounted CCTV cameras. The key components of the framework include object recognition, perspective transformation, and vehicle trajectory reconstruction for traffic data collection. First, a state-of-the-art vehicle recognition model is implemented to detect and classify vehicles. Next, to correct for camera distortion and reduce partial occlusion, an algorithm inspired by two-point linear perspective is utilized to extracts the region of interest (ROI) automatically, while a 2D homography technique transforms the CCTV view to bird's-eye view (BEV). Cameras are calibrated with a two-layer matrix system to enable the extraction of speed and acceleration by converting image coordinates to real-world measurements. Individual vehicle trajectories are constructed and compared in BEV using two time-space-feature-based object trackers, namely Motpy and BYTETrack. The results of the current study showed about +/- 4.5% error rate for directional traffic counts, less than 10% MSE for speed bias between camera estimates in comparison to estimates from probe data sources. Extracting high-resolution data from traffic cameras has several implications, ranging from improvements in traffic management and identify dangerous driving behavior, high-risk areas for accidents, and other safety concerns, enabling proactive measures to reduce accidents and fatalities.
- [870] arXiv:2401.07234 (cross-list from cs.LG) [ pdf , other ]
-
Title: The Effects of Data Imbalance Under a Federated Learning Approach for Credit Risk ForecastingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Credit risk forecasting plays a crucial role for commercial banks and other financial institutions in granting loans to customers and minimise the potential loss. However, traditional machine learning methods require the sharing of sensitive client information with an external server to build a global model, potentially posing a risk of security threats and privacy leakage. A newly developed privacy-preserving distributed machine learning technique known as Federated Learning (FL) allows the training of a global model without the necessity of accessing private local data directly. This investigation examined the feasibility of federated learning in credit risk assessment and showed the effects of data imbalance on model performance. Two neural network architectures, Multilayer Perceptron (MLP) and Long Short-Term Memory (LSTM), and one tree ensemble architecture, Extreme Gradient Boosting (XGBoost), were explored across three different datasets under various scenarios involving different numbers of clients and data distribution configurations. We demonstrate that federated models consistently outperform local models on non-dominant clients with smaller datasets. This trend is especially pronounced in highly imbalanced data scenarios, yielding a remarkable average improvement of 17.92% in model performance. However, for dominant clients (clients with more data), federated models may not exhibit superior performance, suggesting the need for special incentives for this type of clients to encourage their participation.
- [871] arXiv:2401.07237 (cross-list from cs.CL) [ pdf , other ]
-
Title: Distilling Event Sequence Knowledge From Large Language ModelsComments: Under ReviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Event sequence models have been found to be highly effective in the analysis and prediction of events. Building such models requires availability of abundant high-quality event sequence data. In certain applications, however, clean structured event sequences are not available, and automated sequence extraction results in data that is too noisy and incomplete. In this work, we explore the use of Large Language Models (LLMs) to generate event sequences that can effectively be used for probabilistic event model construction. This can be viewed as a mechanism of distilling event sequence knowledge from LLMs. Our approach relies on a Knowledge Graph (KG) of event concepts with partial causal relations to guide the generative language model for causal event sequence generation. We show that our approach can generate high-quality event sequences, filling a knowledge gap in the input KG. Furthermore, we explore how the generated sequences can be leveraged to discover useful and more complex structured knowledge from pattern mining and probabilistic event models. We release our sequence generation code and evaluation framework, as well as corpus of event sequence data.
- [872] arXiv:2401.07250 (cross-list from cs.LG) [ pdf , other ]
-
Title: Stabilizing Sharpness-aware Minimization Through A Simple Renormalization StrategyComments: 31 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recently, sharpness-aware minimization (SAM) has attracted a lot of attention because of its surprising effectiveness in improving generalization performance.However, training neural networks with SAM can be highly unstable since the loss does not decrease along the direction of the exact gradient at the current point, but instead follows the direction of a surrogate gradient evaluated at another point nearby. To address this issue, we propose a simple renormalization strategy, dubbed StableSAM, so that the norm of the surrogate gradient maintains the same as that of the exact gradient. Our strategy is easy to implement and flexible enough to integrate with SAM and its variants, almost at no computational cost. With elementary tools from convex optimization and learning theory, we also conduct a theoretical analysis of sharpness-aware training, revealing that compared to stochastic gradient descent (SGD), the effectiveness of SAM is only assured in a limited regime of learning rate. In contrast, we show how StableSAM extends this regime of learning rate and when it can consistently perform better than SAM with minor modification. Finally, we demonstrate the improved performance of StableSAM on several representative data sets and tasks.
- [873] arXiv:2401.07263 (cross-list from cs.LG) [ pdf , other ]
-
Title: BET: Explaining Deep Reinforcement Learning through The Error-Prone DecisionsComments: This is an early version of a paper that submitted to IJCAI 2024 8 pages, 4 figures and 1 tableSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Despite the impressive capabilities of Deep Reinforcement Learning (DRL) agents in many challenging scenarios, their black-box decision-making process significantly limits their deployment in safety-sensitive domains. Several previous self-interpretable works focus on revealing the critical states of the agent's decision. However, they cannot pinpoint the error-prone states. To address this issue, we propose a novel self-interpretable structure, named Backbone Extract Tree (BET), to better explain the agent's behavior by identify the error-prone states. At a high level, BET hypothesizes that states in which the agent consistently executes uniform decisions exhibit a reduced propensity for errors. To effectively model this phenomenon, BET expresses these states within neighborhoods, each defined by a curated set of representative states. Therefore, states positioned at a greater distance from these representative benchmarks are more prone to error. We evaluate BET in various popular RL environments and show its superiority over existing self-interpretable models in terms of explanation fidelity. Furthermore, we demonstrate a use case for providing explanations for the agents in StarCraft II, a sophisticated multi-agent cooperative game. To the best of our knowledge, we are the first to explain such a complex scenarios using a fully transparent structure.
- [874] arXiv:2401.07271 (cross-list from cs.CV) [ pdf , other ]
-
Title: SpineCLUE: Automatic Vertebrae Identification Using Contrastive Learning and Uncertainty EstimationAuthors: Sheng Zhang , Minheng Chen , Junxian Wu , Ziyue Zhang , Tonglong Li , Cheng Xue , Youyong KongSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Vertebrae identification in arbitrary fields-of-view plays a crucial role in diagnosing spine disease. Most spine CT contain only local regions, such as the neck, chest, and abdomen. Therefore, identification should not depend on specific vertebrae or a particular number of vertebrae being visible. Existing methods at the spine-level are unable to meet this challenge. In this paper, we propose a three-stage method to address the challenges in 3D CT vertebrae identification at vertebrae-level. By sequentially performing the tasks of vertebrae localization, segmentation, and identification, the anatomical prior information of the vertebrae is effectively utilized throughout the process. Specifically, we introduce a dual-factor density clustering algorithm to acquire localization information for individual vertebra, thereby facilitating subsequent segmentation and identification processes. In addition, to tackle the issue of interclass similarity and intra-class variability, we pre-train our identification network by using a supervised contrastive learning method. To further optimize the identification results, we estimated the uncertainty of the classification network and utilized the message fusion module to combine the uncertainty scores, while aggregating global information about the spine. Our method achieves state-of-the-art results on the VerSe19 and VerSe20 challenge benchmarks. Additionally, our approach demonstrates outstanding generalization performance on an collected dataset containing a wide range of abnormal cases.
- [875] arXiv:2401.07278 (cross-list from cs.CV) [ pdf , other ]
-
Title: Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood CellsAuthors: Vinh Quoc Luu , Duy Khanh Le , Huy Thanh Nguyen , Minh Thanh Nguyen , Thinh Tien Nguyen , Vinh Quang DinhSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first challenge, a semi-supervised learning framework should be devised to efficiently capitalize on the scarcity of the dataset available. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. Self-training is a technique that utilizes the model trained on labeled data to generate pseudo-labels for the unlabeled data and then re-train on both of them. FixMatch is a consistency-regularization algorithm to enforce the model's robustness against variations in the input image. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.
- [876] arXiv:2401.07301 (cross-list from cs.CL) [ pdf , other ]
-
Title: Small Language Model Can Self-correctSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Generative Language Models (LMs) such as ChatGPT have exhibited remarkable performance across various downstream tasks. Nevertheless, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. Previous studies have devised sophisticated pipelines and prompts to induce large LMs to exhibit the capability for self-correction. However, large LMs are explicitly prompted to verify and modify its answers separately rather than completing all steps spontaneously like humans. Moreover, these complex prompts are extremely challenging for small LMs to follow. In this paper, we introduce the \underline{I}ntrinsic \underline{S}elf-\underline{C}orrection (ISC) in generative language models, aiming to correct the initial output of LMs in a self-triggered manner, even for those small LMs with 6 billion parameters. Specifically, we devise a pipeline for constructing self-correction data and propose Partial Answer Masking (PAM), aiming to endow the model with the capability for intrinsic self-correction through fine-tuning. We conduct experiments using LMs with parameters sizes ranging from 6 billion to 13 billion in two tasks, including commonsense reasoning and factual knowledge reasoning. Our experiments demonstrate that the outputs generated using ISC outperform those generated without self-correction. We believe that the output quality of even small LMs can be further improved by empowering them with the ability to intrinsic self-correct.
- [877] arXiv:2401.07333 (cross-list from cs.CL) [ pdf , other ]
-
Title: ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence ReorderingComments: Working in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation. However, existing methods still have some limitations: 1) repetitions, transpositions, and omissions in the output synthesized speech due to limited alignment constraints between audio and phoneme tokens; 2) challenges of fine-grained control over the synthesized speech with autoregressive (AR) language model; 3) infinite silence generation due to the nature of AR-based decoding, especially under the greedy strategy. To alleviate these issues, we propose ELLA-V, a simple but efficient LM-based zero-shot text-to-speech (TTS) framework, which enables fine-grained control over synthesized audio at the phoneme level. The key to ELLA-V is interleaving sequences of acoustic and phoneme tokens, where phoneme tokens appear ahead of the corresponding acoustic tokens. The experimental findings reveal that our model outperforms VALL-E in terms of accuracy and delivers more stable results using both greedy and sampling-based decoding strategies. The code of ELLA-V will be open-sourced after cleanups. Audio samples are available at this https URL .
- [878] arXiv:2401.07348 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Generative AI in EU Law: Liability, Privacy, Intellectual Property, and CybersecuritySubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: The advent of Generative AI, particularly through Large Language Models (LLMs) like ChatGPT and its successors, marks a paradigm shift in the AI landscape. Advanced LLMs exhibit multimodality, handling diverse data formats, thereby broadening their application scope. However, the complexity and emergent autonomy of these models introduce challenges in predictability and legal compliance. This paper delves into the legal and regulatory implications of Generative AI and LLMs in the European Union context, analyzing aspects of liability, privacy, intellectual property, and cybersecurity. It critically examines the adequacy of the existing and proposed EU legislation, including the Artificial Intelligence Act (AIA) draft, in addressing the unique challenges posed by Generative AI in general and LLMs in particular. The paper identifies potential gaps and shortcomings in the legislative framework and proposes recommendations to ensure the safe and compliant deployment of generative models, ensuring they align with the EU's evolving digital landscape and legal standards.
- [879] arXiv:2401.07353 (cross-list from cs.SE) [ pdf , other ]
-
Title: Towards Engineering Fair and Equitable Software Systems for Managing Low-Altitude Airspace AuthorizationsAuthors: Usman Gohar , Michael C. Hunter , Agnieszka Marczak-Czajka , Robyn R. Lutz , Myra B. Cohen , Jane Cleland-HuangJournal-ref: ICSE-SEIS 2024Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Small Unmanned Aircraft Systems (sUAS) have gained widespread adoption across a diverse range of applications. This has introduced operational complexities within shared airspaces and an increase in reported incidents, raising safety concerns. In response, the U.S. Federal Aviation Administration (FAA) is developing a UAS Traffic Management (UTM) system to control access to airspace based on an sUAS's predicted ability to safely complete its mission. However, a fully automated system capable of swiftly approving or denying flight requests can be prone to bias and must consider safety, transparency, and fairness to diverse stakeholders. In this paper, we present an initial study that explores stakeholders' perspectives on factors that should be considered in an automated system. Results indicate flight characteristics and environmental conditions were perceived as most important but pilot and drone capabilities should also be considered. Further, several respondents indicated an aversion to any AI-supported automation, highlighting the need for full transparency in automated decision-making. Results provide a societal perspective on the challenges of automating UTM flight authorization decisions and help frame the ongoing design of a solution acceptable to the broader sUAS community.
- [880] arXiv:2401.07364 (cross-list from cs.LG) [ pdf , other ]
-
Title: PDE Generalization of In-Context Operator Networks: A Study on 1D Scalar Nonlinear Conservation LawsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Abstract: Can we build a single large model for a wide range of PDE-related scientific learning tasks? Can this model generalize to new PDEs, even of new forms, without any fine-tuning? In-context operator learning and the corresponding model In-Context Operator Networks (ICON) represent an initial exploration of these questions. The capability of ICON regarding the first question has been demonstrated previously. In this paper, we present a detailed methodology for solving PDE problems with ICON, and show how a single ICON model can make forward and reverse predictions for different equations with different strides, provided with appropriately designed data prompts. We show the positive evidence to the second question, i.e., ICON can generalize well to some PDEs with new forms without any fine-tuning. This is exemplified through a study on 1D scalar nonlinear conservation laws, a family of PDEs with temporal evolution. We also show how to broaden the range of problems that an ICON model can address, by transforming functions and equations to ICON's capability scope. We believe that the progress in this paper is a significant step towards the goal of training a foundation model for PDE-related tasks under the in-context operator learning framework.
- [881] arXiv:2401.07378 (cross-list from cs.CV) [ pdf , other ]
-
Title: Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor SearchAuthors: Guangyu Meng , Ruyu Zhou , Liu Liu , Peixian Liang , Fang Liu , Danny Chen , Michael Niemier , X.Sharon HuSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Earth Mover's Distance (EMD) is an important similarity measure between two distributions, used in computer vision and many other application domains. However, its exact calculation is computationally and memory intensive, which hinders its scalability and applicability for large-scale problems. Various approximate EMD algorithms have been proposed to reduce computational costs, but they suffer lower accuracy and may require additional memory usage or manual parameter tuning. In this paper, we present a novel approach, NNS-EMD, to approximate EMD using Nearest Neighbor Search (NNS), in order to achieve high accuracy, low time complexity, and high memory efficiency. The NNS operation reduces the number of data points compared in each NNS iteration and offers opportunities for parallel processing. We further accelerate NNS-EMD via vectorization on GPU, which is especially beneficial for large datasets. We compare NNS-EMD with both the exact EMD and state-of-the-art approximate EMD algorithms on image classification and retrieval tasks. We also apply NNS-EMD to calculate transport mapping and realize color transfer between images. NNS-EMD can be 44x to 135x faster than the exact EMD implementation, and achieves superior accuracy, speedup, and memory efficiency over existing approximate EMD methods.
- [882] arXiv:2401.07382 (cross-list from cs.CL) [ pdf , other ]
-
Title: Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward for an entire output. This sparsity of rewards can lead to inefficient and unstable learning. To address this challenge, our paper introduces an novel framework that utilizes the critique capability of Large Language Models (LLMs) to produce intermediate-step rewards during RL training. Our method involves coupling a policy model with a critic language model, which is responsible for providing comprehensive feedback of each part of the output. This feedback is then translated into token or span-level rewards that can be used to guide the RL training process. We investigate this approach under two different settings: one where the policy model is smaller and is paired with a more powerful critic model, and another where a single language model fulfills both roles. We assess our approach on three text generation tasks: sentiment control, language model detoxification, and summarization. Experimental results show that incorporating artificial intrinsic rewards significantly improve both sample efficiency and the overall performance of the policy model, supported by both automatic and human evaluation.
- [883] arXiv:2401.07387 (cross-list from cs.LG) [ pdf , other ]
-
Title: Optimising network interactions through device agnostic modelsAuthors: Luca Manneschi , Ian T. Vidamour , Kilian D. Stenning , Jack C. Gartside , Charles Swindells , Guru Venkat , David Griffin , Susan Stepney , Will R. Branford , Thomas Hayward , Matt O Ellis , Eleni VasilakiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Neural and Evolutionary Computing (cs.NE)
Abstract: Physically implemented neural networks hold the potential to achieve the performance of deep learning models by exploiting the innate physical properties of devices as computational tools. This exploration of physical processes for computation requires to also consider their intrinsic dynamics, which can serve as valuable resources to process information. However, existing computational methods are unable to extend the success of deep learning techniques to parameters influencing device dynamics, which often lack a precise mathematical description. In this work, we formulate a universal framework to optimise interactions with dynamic physical systems in a fully data-driven fashion. The framework adopts neural stochastic differential equations as differentiable digital twins, effectively capturing both deterministic and stochastic behaviours of devices. Employing differentiation through the trained models provides the essential mathematical estimates for optimizing a physical neural network, harnessing the intrinsic temporal computation abilities of its physical nodes. To accurately model real devices' behaviours, we formulated neural-SDE variants that can operate under a variety of experimental settings. Our work demonstrates the framework's applicability through simulations and physical implementations of interacting dynamic devices, while highlighting the importance of accurately capturing system stochasticity for the successful deployment of a physically defined neural network.
- [884] arXiv:2401.07389 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Rapid Review of Clustering AlgorithmsAuthors: Hui Yin , Amir Aryani , Stephen Petrie , Aishwarya Nambissan , Aland Astudillo , Shengyuan CaoComments: 25 pages, 7 figures, 3 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Clustering algorithms aim to organize data into groups or clusters based on the inherent patterns and similarities within the data. They play an important role in today's life, such as in marketing and e-commerce, healthcare, data organization and analysis, and social media. Numerous clustering algorithms exist, with ongoing developments introducing new ones. Each algorithm possesses its own set of strengths and weaknesses, and as of now, there is no universally applicable algorithm for all tasks. In this work, we analyzed existing clustering algorithms and classify mainstream algorithms across five different dimensions: underlying principles and characteristics, data point assignment to clusters, dataset capacity, predefined cluster numbers and application area. This classification facilitates researchers in understanding clustering algorithms from various perspectives and helps them identify algorithms suitable for solving specific tasks. Finally, we discussed the current trends and potential future directions in clustering algorithms. We also identified and discussed open challenges and unresolved issues in the field.
- [885] arXiv:2401.07395 (cross-list from cs.LG) [ pdf , other ]
-
Title: Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text ClassificationComments: 7 pages AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Within the scope of natural language processing, the domain of multi-label text classification is uniquely challenging due to its expansive and uneven label distribution. The complexity deepens due to the demand for an extensive set of annotated data for training an advanced deep learning model, especially in specialized fields where the labeling task can be labor-intensive and often requires domain-specific knowledge. Addressing these challenges, our study introduces a novel deep active learning strategy, capitalizing on the Beta family of proper scoring rules within the Expected Loss Reduction framework. It computes the expected increase in scores using the Beta Scoring Rules, which are then transformed into sample vector representations. These vector representations guide the diverse selection of informative samples, directly linking this process to the model's expected proper score. Comprehensive evaluations across both synthetic and real datasets reveal our method's capability to often outperform established acquisition techniques in multi-label text classification, presenting encouraging outcomes across various architectural and dataset scenarios.
- [886] arXiv:2401.07445 (cross-list from cs.IR) [ pdf , other ]
-
Title: GACE: Learning Graph-Based Cross-Page Ads Embedding For Click-Through Rate PredictionAuthors: Haowen Wang , Yuliang Du , Congyun Jin , Yujiao Li , Yingbo Wang , Tao Sun , Piqi Qin , Cong FanComments: 15 pages, 3 figuresSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
Abstract: Predicting click-through rate (CTR) is the core task of many ads online recommendation systems, which helps improve user experience and increase platform revenue. In this type of recommendation system, we often encounter two main problems: the joint usage of multi-page historical advertising data and the cold start of new ads. In this paper, we proposed GACE, a graph-based cross-page ads embedding generation method. It can warm up and generate the representation embedding of cold-start and existing ads across various pages. Specifically, we carefully build linkages and a weighted undirected graph model considering semantic and page-type attributes to guide the direction of feature fusion and generation. We designed a variational auto-encoding task as pre-training module and generated embedding representations for new and old ads based on this task. The results evaluated in the public dataset AliEC from RecBole and the real-world industry dataset from Alipay show that our GACE method is significantly superior to the SOTA method. In the online A/B test, the click-through rate on three real-world pages from Alipay has increased by 3.6%, 2.13%, and 3.02%, respectively. Especially in the cold-start task, the CTR increased by 9.96%, 7.51%, and 8.97%, respectively.
- [887] arXiv:2401.07447 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Taec: a Manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literatureComments: 17 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Wheat varieties show a large diversity of traits and phenotypes. Linking them to genetic variability is essential for shorter and more efficient wheat breeding programs. Newly desirable wheat variety traits include disease resistance to reduce pesticide use, adaptation to climate change, resistance to heat and drought stresses, or low gluten content of grains. Wheat breeding experiments are documented by a large body of scientific literature and observational data obtained in-field and under controlled conditions. The cross-referencing of complementary information from the literature and observational data is essential to the study of the genotype-phenotype relationship and to the improvement of wheat selection. The scientific literature on genetic marker-assisted selection describes much information about the genotype-phenotype relationship. However, the variety of expressions used to refer to traits and phenotype values in scientific articles is a hinder to finding information and cross-referencing it. When trained adequately by annotated examples, recent text mining methods perform highly in named entity recognition and linking in the scientific domain. While several corpora contain annotations of human and animal phenotypes, currently, no corpus is available for training and evaluating named entity recognition and entity-linking methods in plant phenotype literature. The Triticum aestivum trait Corpus is a new gold standard for traits and phenotypes of wheat. It consists of 540 PubMed references fully annotated for trait, phenotype, and species named entities using the Wheat Trait and Phenotype Ontology and the species taxonomy of the National Center for Biotechnology Information. A study of the performance of tools trained on the Triticum aestivum trait Corpus shows that the corpus is suitable for the training and evaluation of named entity recognition and linking.
- [888] arXiv:2401.07450 (cross-list from cs.CV) [ pdf , other ]
-
Title: Hierarchical Fashion Design with Multi-stage Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Cross-modal fashion synthesis and editing offer intelligent support to fashion designers by enabling the automatic generation and local modification of design drafts.While current diffusion models demonstrate commendable stability and controllability in image synthesis,they still face significant challenges in generating fashion design from abstract design elements and fine-grained editing.Abstract sensory expressions, \eg office, business, and party, form the high-level design concepts, while measurable aspects like sleeve length, collar type, and pant length are considered the low-level attributes of clothing.Controlling and editing fashion images using lengthy text descriptions poses a this http URL this paper, we propose HieraFashDiff,a novel fashion design method using the shared multi-stage diffusion model encompassing high-level design concepts and low-level clothing attributes in a hierarchical structure.Specifically, we categorized the input text into different levels and fed them in different time step to the diffusion model according to the criteria of professional clothing designers.HieraFashDiff allows designers to add low-level attributes after high-level prompts for interactive editing this http URL addition, we design a differentiable loss function in the sampling process with a mask to keep non-edit areas.Comprehensive experiments performed on our newly conducted Hierarchical fashion dataset,demonstrate that our proposed method outperforms other state-of-the-art competitors.
- [889] arXiv:2401.07453 (cross-list from cs.CL) [ pdf , other ]
-
Title: Model Editing at Scale leads to Gradual and Catastrophic ForgettingComments: Added experiments with GPT-J and appendix for all samples and ablationsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Editing knowledge in large language models is an attractive capability to have which allows us to correct incorrectly learnt facts during pre-training, as well as update the model with an ever-growing list of new facts. While existing model editing techniques have shown promise, they are usually evaluated using metrics for reliability, specificity and generalization over one or few edits. We argue that for model editing to have practical utility, we must be able to make multiple edits to the same model. With this in mind, we evaluate the current model editing methods at scale, focusing on two state of the art methods: ROME and MEMIT. We find that as the model is edited sequentially with multiple facts, it continually forgets previously edited facts and the ability to perform downstream tasks. This forgetting happens in two phases -- an initial gradual but progressive forgetting phase followed by abrupt or catastrophic forgetting phase. Both gradual and catastrophic forgetting limit the usefulness of model editing methods at scale -- the former making model editing less effective as multiple edits are made to the model while the latter caps the scalability of such model editing methods. Our analysis also highlights other key limitations of ROME and MEMIT at scale. With our work, we push for the development and evaluation of model editing methods keeping scalability in mind.
- [890] arXiv:2401.07456 (cross-list from cs.CL) [ pdf , other ]
-
Title: Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine TranslationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Federated learning (FL) is a promising approach for solving multilingual tasks, potentially enabling clients with their own language-specific data to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints in practical network systems present challenges for exchanging large-scale NMT engines between FL parties. In this paper, we propose a meta-learning-based adaptive parameter selection methodology, MetaSend, that improves the communication efficiency of model transmissions from clients during FL-based multilingual NMT training. Our approach learns a dynamic threshold for filtering parameters prior to transmission without compromising the NMT model quality, based on the tensor deviations of clients between different FL rounds. Through experiments on two NMT datasets with different language distributions, we demonstrate that MetaSend obtains substantial improvements over baselines in translation quality in the presence of a limited communication budget.
- [891] arXiv:2401.07466 (cross-list from cs.SE) [ pdf , other ]
-
Title: Your Instructions Are Not Always Helpful: Assessing the Efficacy of Instruction Fine-tuning for Software Vulnerability DetectionSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Software, while beneficial, poses potential cybersecurity risks due to inherent vulnerabilities. Detecting these vulnerabilities is crucial, and deep learning has shown promise as an effective tool for this task due to its ability to perform well without extensive feature engineering. However, a challenge in deploying deep learning for vulnerability detection is the limited availability of training data. Recent research highlights the deep learning efficacy in diverse tasks. This success is attributed to instruction fine-tuning, a technique that remains under-explored in the context of vulnerability detection. This paper investigates the capability of models, specifically a recent language model, to generalize beyond the programming languages used in their training data. It also examines the role of natural language instructions in enhancing this generalization. Our study evaluates the model performance on a real-world dataset to predict vulnerable code. We present key insights and lessons learned, contributing to understanding the deep learning application in software vulnerability detection.
- [892] arXiv:2401.07468 (cross-list from cs.LG) [ pdf , other ]
-
Title: CarSpeedNet: A Deep Neural Network-based Car Speed Estimation from Smartphone AccelerometerAuthors: Barak OrComments: 8 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In this study, a novel deep neural network (DNN) architecture, CarSpeedNet, is introduced to estimate car speed using three-axis accelerometer data from smartphones. Utilizing 13 hours of data collected from smartphones mounted in vehicles navigating through various regions in Israel, the CarSpeedNet effectively learns the relationship between measured smartphone acceleration and car speed. Ground truth speed data was obtained at 1[Hz] from the GPS receiver in the smartphones. The proposed model enables high-frequency speed estimation, incorporating historical inputs. Our trained model demonstrates exceptional accuracy in car speed estimation, achieving a precision of less than 0.72[m/s] during an extended driving test, solely relying on smartphone accelerometer data without any connectivity to the car.
- [893] arXiv:2401.07470 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Utilizing deep learning models for the identification of enhancers and super-enhancers based on genomic and epigenomic featuresComments: 13 pages, 7 figures, 6 TablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper provides an extensive examination of a sizable dataset of English tweets focusing on nine widely recognized cryptocurrencies, specifically Cardano, Binance, Bitcoin, Dogecoin, Ethereum, Fantom, Matic, Shiba, and Ripple. Our primary objective was to conduct a psycholinguistic and emotion analysis of social media content associated with these cryptocurrencies. To enable investigators to make more informed decisions. The study involved comparing linguistic characteristics across the diverse digital coins, shedding light on the distinctive linguistic patterns that emerge within each coin's community. To achieve this, we utilized advanced text analysis techniques. Additionally, our work unveiled an intriguing Understanding of the interplay between these digital assets within the cryptocurrency community. By examining which coin pairs are mentioned together most frequently in the dataset, we established correlations between different cryptocurrencies. To ensure the reliability of our findings, we initially gathered a total of 832,559 tweets from Twitter. These tweets underwent a rigorous preprocessing stage, resulting in a refined dataset of 115,899 tweets that were used for our analysis. Overall, our research offers valuable Perception into the linguistic nuances of various digital coins' online communities and provides a deeper understanding of their interactions in the cryptocurrency space.
- [894] arXiv:2401.07510 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question AnsweringComments: 50 pages, 3 figures, 3 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: ChatGPT explores a strategic blueprint of question answering (QA) in delivering medical diagnosis, treatment recommendations, and other healthcare support. This is achieved through the increasing incorporation of medical domain data via natural language processing (NLP) and multimodal paradigms. By transitioning the distribution of text, images, videos, and other modalities from the general domain to the medical domain, these techniques have expedited the progress of medical domain question answering (MDQA). They bridge the gap between human natural language and sophisticated medical domain knowledge or expert manual annotations, handling large-scale, diverse, unbalanced, or even unlabeled data analysis scenarios in medical contexts. Central to our focus is the utilizing of language models and multimodal paradigms for medical question answering, aiming to guide the research community in selecting appropriate mechanisms for their specific medical research requirements. Specialized tasks such as unimodal-related question answering, reading comprehension, reasoning, diagnosis, relation extraction, probability modeling, and others, as well as multimodal-related tasks like vision question answering, image caption, cross-modal retrieval, report summarization, and generation, are discussed in detail. Each section delves into the intricate specifics of the respective method under consideration. This paper highlights the structures and advancements of medical domain explorations against general domain methods, emphasizing their applications across different tasks and datasets. It also outlines current challenges and opportunities for future medical domain research, paving the way for continued innovation and application in this rapidly evolving field.
- [895] arXiv:2401.07518 (cross-list from cs.CL) [ pdf , other ]
-
Title: Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future TrendsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Natural Language Processing (NLP) aims to analyze text or speech via techniques in the computer science field. It serves the applications in domains of healthcare, commerce, education and so on. Particularly, NLP has been widely applied to the education domain and its applications have enormous potential to help teaching and learning. In this survey, we review recent advances in NLP with the focus on solving problems relevant to the education domain. In detail, we begin with introducing the related background and the real-world scenarios in education where NLP techniques could contribute. Then, we present a taxonomy of NLP in the education domain and highlight typical NLP applications including question answering, question construction, automated assessment, and error correction. Next, we illustrate the task definition, challenges, and corresponding cutting-edge techniques based on the above taxonomy. In particular, LLM-involved methods are included for discussion due to the wide usage of LLMs in diverse NLP applications. After that, we showcase some off-the-shelf demonstrations in this domain. At last, we conclude with six promising directions for future research, including more datasets in education domain, controllable usage of LLMs, intervention of difficulty-level control, interpretable educational NLP, methods with adaptive learning, and integrated systems for education. We organize all relevant datasets and papers in the open-available Github Link for better review~\url{ this https URL }.
- [896] arXiv:2401.07519 (cross-list from cs.CV) [ pdf , other ]
-
Title: InstantID: Zero-shot Identity-Preserving Generation in SecondsAuthors: Qixun Wang , Xu Bai , Haofan Wang , Zekui Qin , Anthony Chen , Huaxia Li , Xu Tang , Yao HuComments: Technical Report, project page available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: There has been significant progress in personalized image synthesis with methods such as Textual Inversion, DreamBooth, and LoRA. Yet, their real-world applicability is hindered by high storage demands, lengthy fine-tuning processes, and the need for multiple reference images. Conversely, existing ID embedding-based methods, while requiring only a single forward inference, face challenges: they either necessitate extensive fine-tuning across numerous model parameters, lack compatibility with community pre-trained models, or fail to maintain high face fidelity. Addressing these limitations, we introduce InstantID, a powerful diffusion model-based solution. Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image, while ensuring high fidelity. To achieve this, we design a novel IdentityNet by imposing strong semantic and weak spatial conditions, integrating facial and landmark images with textual prompts to steer the image generation. InstantID demonstrates exceptional performance and efficiency, proving highly beneficial in real-world applications where identity preservation is paramount. Moreover, our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin. Our codes and pre-trained checkpoints will be available at this https URL .
- [897] arXiv:2401.07525 (cross-list from cs.CL) [ pdf , other ]
-
Title: TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job FitAuthors: Yihan Cao , Xu Chen , Lun Du , Hao Chen , Qiang Fu , Shi Han , Yushu Du , Yanbin Kang , Guangming Lu , Zi LiComments: ICASSP 2024 camera ready. 5 pages, 1 figure, 3 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation. Recently, pretrained large language models have further enhanced the effectiveness by leveraging richer textual information in user profiles and job descriptions apart from user behavior features and job metadata. However, the general domain-oriented design struggles to capture the unique structural information within user profiles and job descriptions, leading to a loss of latent semantic correlations. We propose TAROT, a hierarchical multitask co-pretraining framework, to better utilize structural and semantic information for informative text embeddings. TAROT targets semi-structured text in profiles and jobs, and it is co-pretained with multi-grained pretraining tasks to constrain the acquired semantic information at each level. Experiments on a real-world LinkedIn dataset show significant performance improvements, proving its effectiveness in person-job fit tasks.
- [898] arXiv:2401.07526 (cross-list from cs.CL) [ pdf , other ]
-
Title: Editing Arbitrary Propositions in LLMs without Subject LabelsAuthors: Itai Feigenbaum , Devansh Arpit , Huan Wang , Shelby Heinecke , Juan Carlos Niebles , Weiran Yao , Caiming Xiong , Silvio SavareseSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Model (LLM) editing modifies factual information in LLMs. Locate-and-Edit (L\&E) methods accomplish this by finding where relevant information is stored within the neural network, and editing the weights at that location. The goal of editing is to modify the response of an LLM to a proposition independently of its phrasing, while not modifying its response to other related propositions. Existing methods are limited to binary propositions, which represent straightforward binary relations between a subject and an object. Furthermore, existing methods rely on semantic subject labels, which may not be available or even be well-defined in practice. In this paper, we show that both of these issues can be effectively skirted with a simple and fast localization method called Gradient Tracing (GT). This localization method allows editing arbitrary propositions instead of just binary ones, and does so without the need for subject labels. As propositions always have a truth value, our experiments prompt an LLM as a boolean classifier, and edit its T/F response to propositions. Our method applies GT for location tracing, and then edit the model at that location using a mild variant of Rank-One Model Editing (ROME). On datasets of binary propositions derived from the CounterFact dataset, we show that our method -- without access to subject labels -- performs close to state-of-the-art L\&E methods which has access subject labels. We then introduce a new dataset, Factual Accuracy Classification Test (FACT), which includes non-binary propositions and for which subject labels are not generally applicable, and therefore is beyond the scope of existing L\&E methods. Nevertheless, we show that with our method editing is possible on FACT.
- [899] arXiv:2401.07532 (cross-list from cs.SD) [ pdf , other ]
-
Title: Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music GenerationAuthors: Zhiwei Lin , Jun Chen , Boshi Tang , Binzhu Sha , Jing Yang , Yaolong Ju , Fan Fan , Shiyin Kang , Zhiyong Wu , Helen MengComments: Accepted by ICASSP 2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still remains unaddressed. To this end, we propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. The Multi-view MidiVAE utilizes the two-dimensional (2-D) representation, OctupleMIDI, to capture relationships among notes while reducing the feature sequences length. Moreover, we focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy to integrate both Track- and Bar-view MidiVAE features. Objective and subjective experimental results on the CocoChorales dataset demonstrate that, compared to the baseline, Multi-view MidiVAE exhibits significant improvements in terms of modeling long multi-track symbolic music.
- [900] arXiv:2401.07543 (cross-list from cs.CE) [ pdf , other ]
-
Title: Must: Maximizing Latent Capacity of Spatial Transcriptomics DataAuthors: Zelin Zang , Liangyu Li , Yongjie Xu , Chenrui Duan , Kai Wang , Yang You , Yi Sun , Stan Z. LiComments: 30 pages and 6 figures, plus 27 pages and 14 figures in appendicesSubjects: Computational Engineering, Finance, and Science (cs.CE) ; Artificial Intelligence (cs.AI)
Abstract: Spatial transcriptomics (ST) technologies have revolutionized the study of gene expression patterns in tissues by providing multimodality data in transcriptomic, spatial, and morphological, offering opportunities for understanding tissue biology beyond transcriptomics. However, we identify the modality bias phenomenon in ST data species, i.e., the inconsistent contribution of different modalities to the labels leads to a tendency for the analysis methods to retain the information of the dominant modality. How to mitigate the adverse effects of modality bias to satisfy various downstream tasks remains a fundamental challenge. This paper introduces Multiple-modality Structure Transformation, named MuST, a novel methodology to tackle the challenge. MuST integrates the multi-modality information contained in the ST data effectively into a uniform latent space to provide a foundation for all the downstream tasks. It learns intrinsic local structures by topology discovery strategy and topology fusion loss function to solve the inconsistencies among different modalities. Thus, these topology-based and deep learning techniques provide a solid foundation for a variety of analytical tasks while coordinating different modalities. The effectiveness of MuST is assessed by performance metrics and biological significance. The results show that it outperforms existing state-of-the-art methods with clear advantages in the precision of identifying and preserving structures of tissues and biomarkers. MuST offers a versatile toolkit for the intricate analysis of complex biological systems.
- [901] arXiv:2401.07586 (cross-list from cs.CV) [ pdf , other ]
-
Title: Curriculum for Crowd Counting -- Is it Worthy?Comments: Accepted version of the paper in 19th International Conference on Computer Vision Theory and Applications (VISAPP), Rome, Italy, 27-19 February 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in deep learning techniques have achieved remarkable performance in several computer vision problems. A notably intuitive technique called Curriculum Learning (CL) has been introduced recently for training deep learning models. Surprisingly, curriculum learning achieves significantly improved results in some tasks but marginal or no improvement in others. Hence, there is still a debate about its adoption as a standard method to train supervised learning models. In this work, we investigate the impact of curriculum learning in crowd counting using the density estimation method. We performed detailed investigations by conducting 112 experiments using six different CL settings using eight different crowd models. Our experiments show that curriculum learning improves the model learning performance and shortens the convergence time.
- [902] arXiv:2401.07591 (cross-list from cs.CV) [ pdf , other ]
-
Title: Multimodal Crowd Counting with Pix2Pix GANsComments: Accepted version of the paper in 19th International Conference on Computer Vision Theory and Applications (VISAPP), Rome, Italy, 27-29 Feb, 2024,Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Most state-of-the-art crowd counting methods use color (RGB) images to learn the density map of the crowd. However, these methods often struggle to achieve higher accuracy in densely crowded scenes with poor illumination. Recently, some studies have reported improvement in the accuracy of crowd counting models using a combination of RGB and thermal images. Although multimodal data can lead to better predictions, multimodal data might not be always available beforehand. In this paper, we propose the use of generative adversarial networks (GANs) to automatically generate thermal infrared (TIR) images from color (RGB) images and use both to train crowd counting models to achieve higher accuracy. We use a Pix2Pix GAN network first to translate RGB images to TIR images. Our experiments on several state-of-the-art crowd counting models and benchmark crowd datasets report significant improvement in accuracy.
- [903] arXiv:2401.07595 (cross-list from cs.LG) [ pdf , other ]
-
Title: E3x: $\mathrm{E}(3)$-Equivariant Deep Learning Made EasySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Chemical Physics (physics.chem-ph)
Abstract: This work introduces E3x, a software package for building neural networks that are equivariant with respect to the Euclidean group $\mathrm{E}(3)$, consisting of translations, rotations, and reflections of three-dimensional space. Compared to ordinary neural networks, $\mathrm{E}(3)$-equivariant models promise benefits whenever input and/or output data are quantities associated with three-dimensional objects. This is because the numeric values of such quantities (e.g. positions) typically depend on the chosen coordinate system. Under transformations of the reference frame, the values change predictably, but the underlying rules can be difficult to learn for ordinary machine learning models. With built-in $\mathrm{E}(3)$-equivariance, neural networks are guaranteed to satisfy the relevant transformation rules exactly, resulting in superior data efficiency and accuracy. The code for E3x is available from this https URL , detailed documentation and usage examples can be found on this https URL .
- [904] arXiv:2401.07603 (cross-list from cs.RO) [ pdf , other ]
-
Title: Multi-task real-robot data with gaze attention for dual-arm fine manipulationComments: 10 pages, The dataset is available at this https URLSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In the field of robotic manipulation, deep imitation learning is recognized as a promising approach for acquiring manipulation skills. Additionally, learning from diverse robot datasets is considered a viable method to achieve versatility and adaptability. In such research, by learning various tasks, robots achieved generality across multiple objects. However, such multi-task robot datasets have mainly focused on single-arm tasks that are relatively imprecise, not addressing the fine-grained object manipulation that robots are expected to perform in the real world. This paper introduces a dataset of diverse object manipulations that includes dual-arm tasks and/or tasks requiring fine manipulation. To this end, we have generated dataset with 224k episodes (150 hours, 1,104 language instructions) which includes dual-arm fine tasks such as bowl-moving, pencil-case opening or banana-peeling, and this data is publicly available. Additionally, this dataset includes visual attention signals as well as dual-action labels, a signal that separates actions into a robust reaching trajectory and precise interaction with objects, and language instructions to achieve robust and precise object manipulation. We applied the dataset to our Dual-Action and Attention (DAA), a model designed for fine-grained dual arm manipulation tasks and robust against covariate shifts. The model was tested with over 7k total trials in real robot manipulation tasks, demonstrating its capability in fine manipulation.
- [905] arXiv:2401.07612 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated ApplicationsAuthors: Xuchen SuoSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The critical challenge of prompt injection attacks in Large Language Models (LLMs) integrated applications, a growing concern in the Artificial Intelligence (AI) field. Such attacks, which manipulate LLMs through natural language inputs, pose a significant threat to the security of these applications. Traditional defense strategies, including output and input filtering, as well as delimiter use, have proven inadequate. This paper introduces the 'Signed-Prompt' method as a novel solution. The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources. The paper presents a comprehensive analysis of prompt injection attack patterns, followed by a detailed explanation of the Signed-Prompt concept, including its basic architecture and implementation through both prompt engineering and fine-tuning of LLMs. Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks, thus validating its potential as a robust defense strategy in AI security.
- [906] arXiv:2401.07655 (cross-list from cs.SE) [ pdf , other ]
-
Title: MLAD: A Unified Model for Multi-system Log Anomaly DetectionAuthors: Runqiang Zang , Hongcheng Guo , Jian Yang , Jiaheng Liu , Zhoujun Li , Tieqiao Zheng , Xu Shi , Liangfan Zheng , Bo ZhangSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In spite of the rapid advancements in unsupervised log anomaly detection techniques, the current mainstream models still necessitate specific training for individual system datasets, resulting in costly procedures and limited scalability due to dataset size, thereby leading to performance bottlenecks. Furthermore, numerous models lack cognitive reasoning capabilities, posing challenges in direct transferability to similar systems for effective anomaly detection. Additionally, akin to reconstruction networks, these models often encounter the "identical shortcut" predicament, wherein the majority of system logs are classified as normal, erroneously predicting normal classes when confronted with rare anomaly logs due to reconstruction errors.
To address the aforementioned issues, we propose MLAD, a novel anomaly detection model that incorporates semantic relational reasoning across multiple systems. Specifically, we employ Sentence-bert to capture the similarities between log sequences and convert them into highly-dimensional learnable semantic vectors. Subsequently, we revamp the formulas of the Attention layer to discern the significance of each keyword in the sequence and model the overall distribution of the multi-system dataset through appropriate vector space diffusion. Lastly, we employ a Gaussian mixture model to highlight the uncertainty of rare words pertaining to the "identical shortcut" problem, optimizing the vector space of the samples using the maximum expectation model. Experiments on three real-world datasets demonstrate the superiority of MLAD. - [907] arXiv:2401.07709 (cross-list from cs.CV) [ pdf , other ]
-
Title: Towards Efficient Diffusion-Based Image Editing with Instant Attention MasksAuthors: Siyu Zou , Jiji Tang , Yiyi Zhou , Jing He , Chaoyi Zhao , Rongsheng Zhang , Zhipeng Hu , Xiaoshuai SunComments: Accepted by AAAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Diffusion-based Image Editing (DIE) is an emerging research hot-spot, which often applies a semantic mask to control the target area for diffusion-based editing. However, most existing solutions obtain these masks via manual operations or off-line processing, greatly reducing their efficiency. In this paper, we propose a novel and efficient image editing method for Text-to-Image (T2I) diffusion models, termed Instant Diffusion Editing(InstDiffEdit). In particular, InstDiffEdit aims to employ the cross-modal attention ability of existing diffusion models to achieve instant mask guidance during the diffusion steps. To reduce the noise of attention maps and realize the full automatics, we equip InstDiffEdit with a training-free refinement scheme to adaptively aggregate the attention distributions for the automatic yet accurate mask generation. Meanwhile, to supplement the existing evaluations of DIE, we propose a new benchmark called Editing-Mask to examine the mask accuracy and local editing ability of existing methods. To validate InstDiffEdit, we also conduct extensive experiments on ImageNet and Imagen, and compare it with a bunch of the SOTA methods. The experimental results show that InstDiffEdit not only outperforms the SOTA methods in both image quality and editing results, but also has a much faster inference speed, i.e., +5 to +6 times.
- [908] arXiv:2401.07729 (cross-list from cs.CV) [ pdf , other ]
-
Title: SSL-Interactions: Pretext Tasks for Interactive Trajectory PredictionComments: 13 pages, 5 figures, submitted to IV-2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: This paper addresses motion forecasting in multi-agent environments, pivotal for ensuring safety of autonomous vehicles. Traditional as well as recent data-driven marginal trajectory prediction methods struggle to properly learn non-linear agent-to-agent interactions. We present SSL-Interactions that proposes pretext tasks to enhance interaction modeling for trajectory prediction. We introduce four interaction-aware pretext tasks to encapsulate various aspects of agent interactions: range gap prediction, closest distance prediction, direction of movement prediction, and type of interaction prediction. We further propose an approach to curate interaction-heavy scenarios from datasets. This curated data has two advantages: it provides a stronger learning signal to the interaction model, and facilitates generation of pseudo-labels for interaction-centric pretext tasks. We also propose three new metrics specifically designed to evaluate predictions in interactive scenes. Our empirical evaluations indicate SSL-Interactions outperforms state-of-the-art motion forecasting methods quantitatively with up to 8% improvement, and qualitatively, for interaction-heavy scenarios.
- [909] arXiv:2401.07796 (cross-list from cs.CV) [ pdf , other ]
-
Title: Fusing Echocardiography Images and Medical Records for Continuous Patient StratificationAuthors: Nathan Painchaud , Pierre-Yves Courand , Pierre-Marc Jodoin , Nicolas Duchateau , Olivier BernardComments: 10 pages, submitted to IEEE TMISubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Deep learning now enables automatic and robust extraction of cardiac function descriptors from echocardiographic sequences, such as ejection fraction or strain. These descriptors provide fine-grained information that physicians consider, in conjunction with more global variables from the clinical record, to assess patients' condition. Drawing on novel transformer models applied to tabular data (e.g., variables from electronic health records), we propose a method that considers all descriptors extracted from medical records and echocardiograms to learn the representation of a difficult-to-characterize cardiovascular pathology, namely hypertension. Our method first projects each variable into its own representation space using modality-specific approaches. These standardized representations of multimodal data are then fed to a transformer encoder, which learns to merge them into a comprehensive representation of the patient through a pretext task of predicting a clinical rating. This pretext task is formulated as an ordinal classification to enforce a pathological continuum in the representation space. We observe the major trends along this continuum for a cohort of 239 hypertensive patients to describe, with unprecedented gradation, the effect of hypertension on a number of cardiac function descriptors. Our analysis shows that i) pretrained weights from a foundation model allow to reach good performance (83% accuracy) even with limited data (less than 200 training samples), ii) trends across the population are reproducible between trainings, and iii) for descriptors whose interactions with hypertension are well documented, patterns are consistent with prior physiological knowledge.
- [910] arXiv:2401.07810 (cross-list from cs.CL) [ pdf , other ]
-
Title: Consolidating Strategies for Countering Hate Speech Using Persuasive DialoguesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Hateful comments are prevalent on social media platforms. Although tools for automatically detecting, flagging, and blocking such false, offensive, and harmful content online have lately matured, such reactive and brute force methods alone provide short-term and superficial remedies while the perpetrators persist. With the public availability of large language models which can generate articulate synthetic and engaging content at scale, there are concerns about the rapid growth of dissemination of such malicious content on the web. There is now a need to focus on deeper, long-term solutions that involve engaging with the human perpetrator behind the source of the content to change their viewpoint or at least bring down the rhetoric using persuasive means. To do that, we propose defining and experimenting with controllable strategies for generating counter-arguments to hateful comments in online conversations. We experiment with controlling response generation using features based on (i) argument structure and reasoning-based Walton argument schemes, (ii) counter-argument speech acts, and (iii) human characteristics-based qualities such as Big-5 personality traits and human values. Using automatic and human evaluations, we determine the best combination of features that generate fluent, argumentative, and logically sound arguments for countering hate. We further share the developed computational models for automatically annotating text with such features, and a silver-standard annotated version of an existing hate speech dialog corpora.
- [911] arXiv:2401.07836 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Two Types of AI Existential Risk: Decisive and AccumulativeAuthors: Atoosa KasirzadehSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects the serious possibility of AI x-risks manifesting incrementally through a series of smaller yet interconnected disruptions, gradually crossing critical thresholds over time. This paper contrasts the conventional "decisive AI x-risk hypothesis" with an "accumulative AI x-risk hypothesis." While the former envisions an overt AI takeover pathway, characterized by scenarios like uncontrollable superintelligence, the latter suggests a different causal pathway to existential catastrophes. This involves a gradual accumulation of critical AI-induced threats such as severe vulnerabilities and systemic erosion of econopolitical structures. The accumulative hypothesis suggests a boiling frog scenario where incremental AI risks slowly converge, undermining resilience until a triggering event results in irreversible collapse. Through systems analysis, this paper examines the distinct assumptions differentiating these two hypotheses. It is then argued that the accumulative view reconciles seemingly incompatible perspectives on AI risks. The implications of differentiating between these causal pathways -- the decisive and the accumulative -- for the governance of AI risks as well as long-term AI safety are discussed.
- [912] arXiv:2401.07844 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian NoiseSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.
- [913] arXiv:2401.07862 (cross-list from eess.SY) [ pdf , other ]
-
Title: Adaptive Neural-Operator Backstepping Control of a Benchmark Hyperbolic PDEComments: 16.5 pages, 3 figuresSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Dynamical Systems (math.DS); Optimization and Control (math.OC)
Abstract: To stabilize PDEs, feedback controllers require gain kernel functions, which are themselves governed by PDEs. Furthermore, these gain-kernel PDEs depend on the PDE plants' functional coefficients. The functional coefficients in PDE plants are often unknown. This requires an adaptive approach to PDE control, i.e., an estimation of the plant coefficients conducted concurrently with control, where a separate PDE for the gain kernel must be solved at each timestep upon the update in the plant coefficient function estimate. Solving a PDE at each timestep is computationally expensive and a barrier to the implementation of real-time adaptive control of PDEs. Recently, results in neural operator (NO) approximations of functional mappings have been introduced into PDE control, for replacing the computation of the gain kernel with a neural network that is trained, once offline, and reused in real-time for rapid solution of the PDEs. In this paper, we present the first result on applying NOs in adaptive PDE control, presented for a benchmark 1-D hyperbolic PDE with recirculation. We establish global stabilization via Lyapunov analysis, in the plant and parameter error states, and also present an alternative approach, via passive identifiers, which avoids the strong assumptions on kernel differentiability. We then present numerical simulations demonstrating stability and observe speedups up to three orders of magnitude, highlighting the real-time efficacy of neural operators in adaptive control. Our code (Github) is made publicly available for future researchers.
- [914] arXiv:2401.07868 (cross-list from cs.RO) [ pdf , other ]
-
Title: Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve ReliabilitySubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability, raising concerns about potential discrepancies in their output. This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios. LLMs have been used to generate task plans, but they are unreliable and may contain wrong, questionable, or high-cost steps. The proposed approach uses LLM to generate a number of task plans as trees and amalgamates them into a graph by removing questionable paths. Then an optimal task tree can be retrieved to circumvent questionable and high-cost nodes, thereby improving planning accuracy and execution efficiency. The approach is further improved by incorporating a large knowledge network. Leveraging GPT-4 further, the high-level task plan is converted into a low-level Planning Domain Definition Language (PDDL) plan executable by a robot. Evaluation results highlight the superior accuracy and efficiency of our approach compared to previous methodologies in the field of task planning.
- [915] arXiv:2401.07870 (cross-list from cs.CL) [ pdf , other ]
-
Title: JumpCoder: Go Beyond Autoregressive Coder via Online ModificationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: While existing code large language models (code LLMs) exhibit impressive capabilities in code generation, their autoregressive sequential generation inherently lacks reversibility. This limitation hinders them from timely correcting previous missing statements during coding as humans do, often leading to error propagation and suboptimal performance. We introduce JumpCoder, a novel modelagnostic framework that enables online modification and non-sequential generation to augment the code LLMs. The key idea behind JumpCoder is to insert new code into the currently generated code when necessary during generation, which is achieved through an auxiliary infilling model that works in tandem with the code LLM. Since identifying the best infill position beforehand is intractable, we adopt an infill-first, judge-later strategy, which experiments with filling at the $k$ most critical positions following the generation of each line, and uses an Abstract Syntax Tree (AST) parser alongside the Generation Model Scoring to effectively judge the validity of each potential infill. Extensive experiments using six state-of-the-art code LLMs across multiple benchmarks consistently indicate significant improvements over all baselines. Notably, JumpCoder assists code LLMs in achieving up to a 3.6% increase in Pass@1 for Python, 6.3% for Java, and 3.7% for C++ in the multilingual HumanEval benchmarks. Our code is public at this https URL .
- [916] arXiv:2401.07877 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: EMBRE: Entity-aware Masking for Biomedical Relation ExtractionComments: 5 pages, 1 figureSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Information extraction techniques, including named entity recognition (NER) and relation extraction (RE), are crucial in many domains to support making sense of vast amounts of unstructured text data by identifying and connecting relevant information. Such techniques can assist researchers in extracting valuable insights. In this paper, we introduce the Entity-aware Masking for Biomedical Relation Extraction (EMBRE) method for biomedical relation extraction, as applied in the context of the BioRED challenge Task 1, in which human-annotated entities are provided as input. Specifically, we integrate entity knowledge into a deep neural network by pretraining the backbone model with an entity masking objective. We randomly mask named entities for each instance and let the model identify the masked entity along with its type. In this way, the model is capable of learning more specific knowledge and more robust representations. Then, we utilize the pre-trained model as our backbone to encode language representations and feed these representations into two multilayer perceptron (MLPs) to predict the logits for relation and novelty, respectively. The experimental results demonstrate that our proposed method can improve the performances of entity pair, relation and novelty extraction over our baseline.
- [917] arXiv:2401.07883 (cross-list from cs.LG) [ pdf , other ]
-
Title: The Chronicles of RAG: The Retriever, the Chunk and the GeneratorAuthors: Paulo Finardi , Leonardo Avila , Rodrigo Castaldoni , Pedro Gengo , Celio Larcher , Marcos Piau , Pablo Costa , Vinicius CaridáComments: 16 pages, 15 figures, 9 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract: Retrieval Augmented Generation (RAG) has become one of the most popular paradigms for enabling LLMs to access external data, and also as a mechanism for grounding to mitigate against hallucinations. When implementing RAG you can face several challenges like effective integration of retrieval models, efficient representation learning, data diversity, computational efficiency optimization, evaluation, and quality of text generation. Given all these challenges, every day a new technique to improve RAG appears, making it unfeasible to experiment with all combinations for your problem. In this context, this paper presents good practices to implement, optimize, and evaluate RAG for the Brazilian Portuguese language, focusing on the establishment of a simple pipeline for inference and experiments. We explored a diverse set of methods to answer questions about the first Harry Potter book. To generate the answers we used the OpenAI's gpt-4, gpt-4-1106-preview, gpt-3.5-turbo-1106, and Google's Gemini Pro. Focusing on the quality of the retriever, our approach achieved an improvement of MRR@10 by 35.4% compared to the baseline. When optimizing the input size in the application, we observed that it is possible to further enhance it by 2.4%. Finally, we present the complete architecture of the RAG with our recommendations. As result, we moved from a baseline of 57.88% to a maximum relative score of 98.61%.
- [918] arXiv:2401.07886 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learned Best-Effort LLM ServingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Many applications must provide low-latency LLM service to users or risk unacceptable user experience. However, over-provisioning resources to serve fluctuating request patterns is often prohibitively expensive. In this work, we present a best-effort serving system that employs deep reinforcement learning to adjust service quality based on the task distribution and system load. Our best-effort system can maintain availability with over 10x higher client request rates, serves above 96% of peak performance 4.1x more often, and serves above 98% of peak performance 2.3x more often than static serving on unpredictable workloads. Our learned router is robust to shifts in both the arrival and task distribution. Compared to static serving, learned best-effort serving allows for cost-efficient serving through increased hardware utility. Additionally, we argue that learned best-effort LLM serving is applicable in wide variety of settings and provides application developers great flexibility to meet their specific needs.
- [919] arXiv:2401.07889 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Machine Learning Techniques to Identify Hand Gestures amidst Forearm Muscle SignalsComments: 21 pages, 7 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: This study investigated the use of forearm EMG data for distinguishing eight hand gestures, employing the Neural Network and Random Forest algorithms on data from ten participants. The Neural Network achieved 97 percent accuracy with 1000-millisecond windows, while the Random Forest achieved 85 percent accuracy with 200-millisecond windows. Larger window sizes improved gesture classification due to increased temporal resolution. The Random Forest exhibited faster processing at 92 milliseconds, compared to the Neural Network's 124 milliseconds. In conclusion, the study identified a Neural Network with a 1000-millisecond stream as the most accurate (97 percent), and a Random Forest with a 200-millisecond stream as the most efficient (85 percent). Future research should focus on increasing sample size, incorporating more hand gestures, and exploring different feature extraction methods and modeling algorithms to enhance system accuracy and efficiency.
- [920] arXiv:2401.07927 (cross-list from cs.CL) [ pdf , other ]
-
Title: Are self-explanations from Large Language Models faithful?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Instruction-tuned Large Language Models (LLMs) excel at many tasks and will even explain their reasoning, so-called self-explanations. However, convincing and wrong self-explanations can lead to unsupported confidence in LLMs, thus increasing risk. Therefore, it's important to measure if self-explanations truly reflect the model's behavior. Such a measure is called interpretability-faithfulness and is challenging to perform since the ground truth is inaccessible, and many LLMs only have an inference API. To address this, we propose employing self-consistency checks to measure faithfulness. For example, if an LLM says a set of words is important for making a prediction, then it should not be able to make its prediction without these words. While self-consistency checks are a common approach to faithfulness, they have not previously been successfully applied to LLM self-explanations for counterfactual, importance measure, and redaction explanations. Our results demonstrate that faithfulness is explanation, model, and task-dependent, showing self-explanations should not be trusted in general. For example, with sentiment classification, counterfactuals are more faithful for Llama2, importance measures for Mistral, and redaction for Falcon 40B.
- [921] arXiv:2401.07931 (cross-list from cs.CV) [ pdf , other ]
-
Title: Vertical Federated Image SegmentationComments: 11 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: With the popularization of AI solutions for image based problems, there has been a growing concern for both data privacy and acquisition. In a large number of cases, information is located on separate data silos and it can be difficult for a developer to consolidate all of it in a fashion that is appropriate for machine learning model development. Alongside this, a portion of these localized data regions may not have access to a labelled ground truth. This indicates that they have the capacity to reach conclusions numerically, but are not able to assign classifications amid a lack of pertinent information. Such a determination is often negligible, especially when attempting to develop image based solutions that often necessitate this capability. With this being the case, we propose an innovative vertical federated learning (VFL) model architecture that can operate under this common set of conditions. This is the first (and currently the only) implementation of a system that can work under the constraints of a VFL environment and perform image segmentation while maintaining nominal accuracies. We achieved this by utilizing an FCN that boasts the ability to operate on federates that lack labelled data and privately share the respective weights with a central server, that of which hosts the necessary features for classification. Tests were conducted on the CamVid dataset in order to determine the impact of heavy feature compression required for the transfer of information between federates, as well as to reach nominal conclusions about the overall performance metrics when working under such constraints.
- [922] arXiv:2401.07955 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Study on Large Language Models' Limitations in Multiple-Choice Question AnsweringSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The widespread adoption of Large Language Models (LLMs) has become commonplace, particularly with the emergence of open-source models. More importantly, smaller models are well-suited for integration into consumer devices and are frequently employed either as standalone solutions or as subroutines in various AI tasks. Despite their ubiquitous use, there is no systematic analysis of their specific capabilities and limitations. In this study, we tackle one of the most widely used tasks - answering Multiple Choice Question (MCQ). We analyze 26 small open-source models and find that 65% of the models do not understand the task, only 4 models properly select an answer from the given choices, and only 5 of these models are choice order independent. These results are rather alarming given the extensive use of MCQ tests with these models. We recommend exercising caution and testing task understanding before using MCQ to evaluate LLMs in any field whatsoever.
- [923] arXiv:2401.07993 (cross-list from cs.LG) [ pdf , other ]
-
Title: Carrying over algorithm in transformersAuthors: Jorrit KruthoffComments: Comments welcome!Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Addition is perhaps one of the simplest arithmetic tasks one can think of and is usually performed using the carrying over algorithm. This algorithm consists of two tasks: adding digits in the same position and carrying over a one whenever necessary. We study how transformer models implement this algorithm and how the two aforementioned tasks are allocated to different parts of the network. We first focus on two-layer encoder-only models and show that the carrying over algorithm is implemented in a modular fashion. The first layer is mostly responsible for adding digits in the same position. The second layer first decides, in the attention, which positions need a carried one or not, and then performs the carrying of the one in the final MLP. We provide a simple way of precisely identifying which neurons are responsible for that task. This implementation of the carrying over algorithm occurs across a range of hyperparameters for two as well as three-layer models. For small decoder-only models, we observe the same implementation and provide suggestive evidence for its existence in three 7B large language models.
- [924] arXiv:2401.08003 (cross-list from cs.CV) [ pdf , other ]
-
Title: Jewelry Recognition via Encoder-Decoder ModelsComments: 6 pages, 5 figures, MetroXRAINE 2023 ConferenceJournal-ref: 2023 IEEE International Conference on Metrology for Extended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Milano, Italy, 2023, pp. 116-121Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Jewelry recognition is a complex task due to the different styles and designs of accessories. Precise descriptions of the various accessories is something that today can only be achieved by experts in the field of jewelry. In this work, we propose an approach for jewelry recognition using computer vision techniques and image captioning, trying to simulate this expert human behavior of analyzing accessories. The proposed methodology consist on using different image captioning models to detect the jewels from an image and generate a natural language description of the accessory. Then, this description is also utilized to classify the accessories at different levels of detail. The generated caption includes details such as the type of jewel, color, material, and design. To demonstrate the effectiveness of the proposed method in accurately recognizing different types of jewels, a dataset consisting of images of accessories belonging to jewelry stores in Córdoba (Spain) has been created. After testing the different image captioning architectures designed, the final model achieves a captioning accuracy of 95\%. The proposed methodology has the potential to be used in various applications such as jewelry e-commerce, inventory management or automatic jewels recognition to analyze people's tastes and social status.
- [925] arXiv:2401.08014 (cross-list from cs.CV) [ pdf , other ]
-
Title: Convolutional Neural Network Compression via Dynamic Parameter Rank PruningComments: 11 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: While Convolutional Neural Networks (CNNs) excel at learning complex latent-space representations, their over-parameterization can lead to overfitting and reduced performance, particularly with limited data. This, alongside their high computational and memory demands, limits the applicability of CNNs for edge deployment. Low-rank matrix approximation has emerged as a promising approach to reduce CNN parameters, but its application presents challenges including rank selection and performance loss. To address these issues, we propose an efficient training method for CNN compression via dynamic parameter rank pruning. Our approach integrates efficient matrix factorization and novel regularization techniques, forming a robust framework for dynamic rank reduction and model compression. We use Singular Value Decomposition (SVD) to model low-rank convolutional filters and dense weight matrices and we achieve model compression by training the SVD factors with back-propagation in an end-to-end way. We evaluate our method on an array of modern CNNs, including ResNet-18, ResNet-20, and ResNet-32, and datasets like CIFAR-10, CIFAR-100, and ImageNet (2012), showcasing its applicability in computer vision. Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
- [926] arXiv:2401.08046 (cross-list from cs.CL) [ pdf , other ]
-
Title: Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive AnalysisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in terms of revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic reports or papers with little to no human contribution. Consequently, researchers have focused on developing detectors to address the misuse of LLMs. However, most existing methods prioritize achieving higher accuracy on restricted datasets, neglecting the crucial aspect of generalizability. This limitation hinders their practical application in real-life scenarios where reliability is paramount. In this paper, we present a comprehensive analysis of the impact of prompts on the text generated by LLMs and highlight the potential lack of robustness in one of the current state-of-the-art GPT detectors. To mitigate these issues concerning the misuse of LLMs in academic writing, we propose a reference-based Siamese detector named Synthetic-Siamese which takes a pair of texts, one as the inquiry and the other as the reference. Our method effectively addresses the lack of robustness of previous detectors (OpenAI detector and DetectGPT) and significantly improves the baseline performances in realistic academic writing scenarios by approximately 67% to 95%.
- [927] arXiv:2401.08066 (cross-list from cs.CV) [ pdf , other ]
-
Title: Achieve Fairness without Demographics for Dermatological Disease DiagnosisSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In medical image diagnosis, fairness has become increasingly crucial. Without bias mitigation, deploying unfair AI would harm the interests of the underprivileged population and potentially tear society apart. Recent research addresses prediction biases in deep learning models concerning demographic groups (e.g., gender, age, and race) by utilizing demographic (sensitive attribute) information during training. However, many sensitive attributes naturally exist in dermatological disease images. If the trained model only targets fairness for a specific attribute, it remains unfair for other attributes. Moreover, training a model that can accommodate multiple sensitive attributes is impractical due to privacy concerns. To overcome this, we propose a method enabling fair predictions for sensitive attributes during the testing phase without using such information during training. Inspired by prior work highlighting the impact of feature entanglement on fairness, we enhance the model features by capturing the features related to the sensitive and target attributes and regularizing the feature entanglement between corresponding classes. This ensures that the model can only classify based on the features related to the target attribute without relying on features associated with sensitive attributes, thereby improving fairness and accuracy. Additionally, we use disease masks from the Segment Anything Model (SAM) to enhance the quality of the learned feature. Experimental results demonstrate that the proposed method can improve fairness in classification compared to state-of-the-art methods in two dermatological disease datasets.
- [928] arXiv:2401.08077 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Transformer-based approach for Ethereum Price Prediction Using Crosscurrency correlation and Sentiment AnalysisComments: 12 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Pricing of Securities (q-fin.PR)
Abstract: The research delves into the capabilities of a transformer-based neural network for Ethereum cryptocurrency price forecasting. The experiment runs around the hypothesis that cryptocurrency prices are strongly correlated with other cryptocurrencies and the sentiments around the cryptocurrency. The model employs a transformer architecture for several setups from single-feature scenarios to complex configurations incorporating volume, sentiment, and correlated cryptocurrency prices. Despite a smaller dataset and less complex architecture, the transformer model surpasses ANN and MLP counterparts on some parameters. The conclusion presents a hypothesis on the illusion of causality in cryptocurrency price movements driven by sentiments.
- [929] arXiv:2401.08089 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Study on Training and Developing Large Language Models for Behavior Tree GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: This paper presents an innovative exploration of the application potential of large language models (LLM) in addressing the challenging task of automatically generating behavior trees (BTs) for complex tasks. The conventional manual BT generation method is inefficient and heavily reliant on domain expertise. On the other hand, existing automatic BT generation technologies encounter bottlenecks related to task complexity, model adaptability, and reliability. In order to overcome these challenges, we propose a novel methodology that leverages the robust representation and reasoning abilities of LLMs. The core contribution of this paper lies in the design of a BT generation framework based on LLM, which encompasses the entire process, from data synthesis and model training to application developing and data verification. Synthetic data is introduced to train the BT generation model (BTGen model), enhancing its understanding and adaptability to various complex tasks, thereby significantly improving its overall performance. In order to ensure the effectiveness and executability of the generated BTs, we emphasize the importance of data verification and introduce a multilevel verification strategy. Additionally, we explore a range of agent design and development schemes with LLM as the central element. We hope that the work in this paper may provide a reference for the researchers who are interested in BT generation based on LLMs.
- [930] arXiv:2401.08092 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Survey of Resource-efficient LLM and Multimodal Foundation ModelsAuthors: Mengwei Xu , Wangsong Yin , Dongqi Cai , Rongjie Yi , Daliang Xu , Qipeng Wang , Bingyang Wu , Yihao Zhao , Chen Yang , Shihe Wang , Qiyang Zhang , Zhenyan Lu , Li Zhang , Shangguang Wang , Yuanchun Li , Yunxin Liu , Xin Jin , Xuanzhe LiuSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come at a significant cost in terms of hardware resources. To support the growth of these large models in a scalable and environmentally sustainable way, there has been a considerable focus on developing resource-efficient strategies. This survey delves into the critical importance of such research, examining both algorithmic and systemic aspects. It offers a comprehensive analysis and valuable insights gleaned from existing literature, encompassing a broad array of topics from cutting-edge model architectures and training/serving algorithms to practical system designs and implementations. The goal of this survey is to provide an overarching understanding of how current approaches are tackling the resource challenges posed by large foundation models and to potentially inspire future breakthroughs in this field.
- [931] arXiv:2401.08095 (cross-list from cs.SD) [ pdf , other ]
-
Title: DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel GenerationComments: 13 pages, 9 figures, 8 tablesSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics. Recent advancements in EVC have involved the simultaneous modeling of pitch and duration, utilizing the potential of sequence-to-sequence (seq2seq) models. To enhance reliability and efficiency in conversion, this study shifts focus towards parallel speech generation. We introduce Duration-Flexible EVC (DurFlex-EVC), which integrates a style autoencoder and unit aligner. Traditional models, while incorporating self-supervised learning (SSL) representations that contain both linguistic and paralinguistic information, have neglected this dual nature, leading to reduced controllability. Addressing this issue, we implement cross-attention to synchronize these representations with various emotions. Additionally, a style autoencoder is developed for the disentanglement and manipulation of style elements. The efficacy of our approach is validated through both subjective and objective evaluations, establishing its superiority over existing models in the field.
- [932] arXiv:2401.08097 (cross-list from cs.SE) [ pdf , other ]
-
Title: A Study of Fairness Concerns in AI-based Mobile App ReviewsAuthors: Ali Rezaei Nasab , Maedeh Dashti , Mojtaba Shahin , Mansooreh Zahedi , Hourieh Khalajzadeh , Chetan Arora , Peng LiangComments: 25 pages, 4 images, 2 tables, Manuscript submitted to a Journal (2024)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Fairness is one of the socio-technical concerns that must be addressed in AI-based systems. Unfair AI-based systems, particularly unfair AI-based mobile apps, can pose difficulties for a significant proportion of the global population. This paper aims to analyze fairness concerns in AI-based app reviews.We first manually constructed a ground-truth dataset, including a statistical sample of fairness and non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning classifiers that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing classifier can detect fairness reviews with a precision of 94%. We then applied the best-performing classifier on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., 'receiving different quality of features and services in different platforms and devices' and 'lack of transparency and fairness in dealing with user-generated content'). Finally, the manual analysis of 2,248 app owners' responses to the fairness reviews identified six root causes (e.g., 'copyright issues') that app owners report to justify fairness concerns.
- [933] arXiv:2401.08099 (cross-list from cs.CV) [ pdf , other ]
-
Title: Inpainting Normal Maps for Lightstage dataComments: 8 pages, 4 figures, CGVC Conference, The Eurographics AssociationJournal-ref: Computer Graphics and Visual Computing (CGVC), 2023, pp. 45-52Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: This study introduces a novel method for inpainting normal maps using a generative adversarial network (GAN). Normal maps, often derived from a lightstage, are crucial in performance capture but can have obscured areas due to movement (e.g., by arms, hair, or props). Inpainting fills these missing areas with plausible data. Our approach extends previous general image inpainting techniques, employing a bow tie-like generator network and a discriminator network, with alternating training phases. The generator aims to synthesize images aligning with the ground truth and deceive the discriminator, which differentiates between real and processed images. Periodically, the discriminator undergoes retraining to enhance its ability to identify processed images. Importantly, our method adapts to the unique characteristics of normal map data, necessitating modifications to the loss function. We utilize a cosine loss instead of mean squared error loss for generator training. Limited training data availability, even with synthetic datasets, demands significant augmentation, considering the specific nature of the input data. This includes appropriate image flipping and in-plane rotations to accurately alter normal vectors. Throughout training, we monitored key metrics such as average loss, Structural Similarity Index Measure (SSIM), and Peak Signal-to-Noise Ratio (PSNR) for the generator, along with average loss and accuracy for the discriminator. Our findings suggest that the proposed model effectively generates high-quality, realistic inpainted normal maps, suitable for performance capture applications. These results establish a foundation for future research, potentially involving more advanced networks and comparisons with inpainting of source images used to create the normal maps.
- [934] arXiv:2401.08100 (cross-list from cs.CV) [ pdf , other ]
-
Title: KTVIC: A Vietnamese Image Captioning Dataset on the Life DomainSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Image captioning is a crucial task with applications in a wide range of domains, including healthcare and education. Despite extensive research on English image captioning datasets, the availability of such datasets for Vietnamese remains limited, with only two existing datasets. In this study, we introduce KTVIC, a comprehensive Vietnamese Image Captioning dataset focused on the life domain, covering a wide range of daily activities. This dataset comprises 4,327 images and 21,635 Vietnamese captions, serving as a valuable resource for advancing image captioning in the Vietnamese language. We conduct experiments using various deep neural networks as the baselines on our dataset, evaluating them using the standard image captioning metrics, including BLEU, METEOR, CIDEr, and ROUGE. Our findings underscore the effectiveness of the proposed dataset and its potential contributions to the field of image captioning in the Vietnamese context.
- [935] arXiv:2401.08103 (cross-list from cs.CY) [ pdf , other ]
-
Title: Resolving Ethics Trade-offs in Implementing Responsible AISubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: While the operationalisation of high-level AI ethics principles into practical AI/ML systems has made progress, there is still a theory-practice gap in managing tensions between the underlying AI ethics aspects. We cover five approaches for addressing the tensions via trade-offs, ranging from rudimentary to complex. The approaches differ in the types of considered context, scope, methods for measuring contexts, and degree of justification. None of the approaches is likely to be appropriate for all organisations, systems, or applications. To address this, we propose a framework which consists of: (i) proactive identification of tensions, (ii) prioritisation and weighting of ethics aspects, (iii) justification and documentation of trade-off decisions. The proposed framework aims to facilitate the implementation of well-rounded AI/ML systems that are appropriate for potential regulatory requirements.
- [936] arXiv:2401.08105 (cross-list from cs.CV) [ pdf , other ]
-
Title: Hardware Acceleration for Real-Time Wildfire Detection Onboard Drone NetworksComments: 6 pages, 7 figures, NETROBOTICS conference submissionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Early wildfire detection in remote and forest areas is crucial for minimizing devastation and preserving ecosystems. Autonomous drones offer agile access to remote, challenging terrains, equipped with advanced imaging technology that delivers both high-temporal and detailed spatial resolution, making them valuable assets in the early detection and monitoring of wildfires. However, the limited computation and battery resources of Unmanned Aerial Vehicles (UAVs) pose significant challenges in implementing robust and efficient image classification models. Current works in this domain often operate offline, emphasizing the need for solutions that can perform inference in real time, given the constraints of UAVs. To address these challenges, this paper aims to develop a real-time image classification and fire segmentation model. It presents a comprehensive investigation into hardware acceleration using the Jetson Nano P3450 and the implications of TensorRT, NVIDIA's high-performance deep-learning inference library, on fire classification accuracy and speed. The study includes implementations of Quantization Aware Training (QAT), Automatic Mixed Precision (AMP), and post-training mechanisms, comparing them against the latest baselines for fire segmentation and classification. All experiments utilize the FLAME dataset - an image dataset collected by low-altitude drones during a prescribed forest fire. This work contributes to the ongoing efforts to enable real-time, on-board wildfire detection capabilities for UAVs, addressing speed and the computational and energy constraints of these crucial monitoring systems. The results show a 13% increase in classification speed compared to similar models without hardware optimization. Comparatively, loss and accuracy are within 1.225% of the original values.
- [937] arXiv:2401.08115 (cross-list from cs.CV) [ pdf , other ]
-
Title: No-Clean-Reference Image Super-Resolution: Application to Electron MicroscopyComments: 13 pages, 12 figures, and 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The inability to acquire clean high-resolution (HR) electron microscopy (EM) images over a large brain tissue volume hampers many neuroscience studies. To address this challenge, we propose a deep-learning-based image super-resolution (SR) approach to computationally reconstruct clean HR 3D-EM with a large field of view (FoV) from noisy low-resolution (LR) acquisition. Our contributions are I) Investigating training with no-clean references for $\ell_2$ and $\ell_1$ loss functions; II) Introducing a novel network architecture, named EMSR, for enhancing the resolution of LR EM images while reducing inherent noise; and, III) Comparing different training strategies including using acquired LR and HR image pairs, i.e., real pairs with no-clean references contaminated with real corruptions, the pairs of synthetic LR and acquired HR, as well as acquired LR and denoised HR pairs. Experiments with nine brain datasets showed that training with real pairs can produce high-quality super-resolved results, demonstrating the feasibility of training with non-clean references for both loss functions. Additionally, comparable results were observed, both visually and numerically, when employing denoised and noisy references for training. Moreover, utilizing the network trained with synthetically generated LR images from HR counterparts proved effective in yielding satisfactory SR results, even in certain cases, outperforming training with real pairs. The proposed SR network was compared quantitatively and qualitatively with several established SR techniques, showcasing either the superiority or competitiveness of the proposed method in mitigating noise while recovering fine details.
- [938] arXiv:2401.08117 (cross-list from cs.CV) [ pdf , other ]
-
Title: E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep LearningComments: Accepted in AAAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Abstract: The bio-inspired event cameras or dynamic vision sensors are capable of asynchronously capturing per-pixel brightness changes (called event-streams) in high temporal resolution and high dynamic range. However, the non-structural spatial-temporal event-streams make it challenging for providing intuitive visualization with rich semantic information for human vision. It calls for events-to-video (E2V) solutions which take event-streams as input and generate high quality video frames for intuitive visualization. However, current solutions are predominantly data-driven without considering the prior knowledge of the underlying statistics relating event-streams and video frames. It highly relies on the non-linearity and generalization capability of the deep neural networks, thus, is struggling on reconstructing detailed textures when the scenes are complex. In this work, we propose \textbf{E2HQV}, a novel E2V paradigm designed to produce high-quality video frames from events. This approach leverages a model-aided deep learning framework, underpinned by a theory-inspired E2V model, which is meticulously derived from the fundamental imaging principles of event cameras. To deal with the issue of state-reset in the recurrent components of E2HQV, we also design a temporal shift embedding module to further improve the quality of the video frames. Comprehensive evaluations on the real world event camera datasets validate our approach, with E2HQV, notably outperforming state-of-the-art approaches, e.g., surpassing the second best by over 40\% for some evaluation metrics.
- [939] arXiv:2401.08121 (cross-list from cs.LG) [ pdf , other ]
-
Title: CycLight: learning traffic signal cooperation with a cycle-level strategySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: This study introduces CycLight, a novel cycle-level deep reinforcement learning (RL) approach for network-level adaptive traffic signal control (NATSC) systems. Unlike most traditional RL-based traffic controllers that focus on step-by-step decision making, CycLight adopts a cycle-level strategy, optimizing cycle length and splits simultaneously using Parameterized Deep Q-Networks (PDQN) algorithm. This cycle-level approach effectively reduces the computational burden associated with frequent data communication, meanwhile enhancing the practicality and safety of real-world applications. A decentralized framework is formulated for multi-agent cooperation, while attention mechanism is integrated to accurately assess the impact of the surroundings on the current intersection. CycLight is tested in a large synthetic traffic grid using the microscopic traffic simulation tool, SUMO. Experimental results not only demonstrate the superiority of CycLight over other state-of-the-art approaches but also showcase its robustness against information transmission delays.
- [940] arXiv:2401.08138 (cross-list from cs.SE) [ pdf , other ]
-
Title: LLMs for Test Input Generation for Semantic CachesAuthors: Zafaryab Rasool , Scott Barnett , David Willie , Stefanus Kurniawan , Sherwin Balugo , Srikanth Thudumu , Mohamed AbdelrazekComments: Accepted in International Conference on AI Engineering Software Engineering (CAIN 2024)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) enable state-of-the-art semantic capabilities to be added to software systems such as semantic search of unstructured documents and text generation. However, these models are computationally expensive. At scale, the cost of serving thousands of users increases massively affecting also user experience. To address this problem, semantic caches are used to check for answers to similar queries (that may have been phrased differently) without hitting the LLM service. Due to the nature of these semantic cache techniques that rely on query embeddings, there is a high chance of errors impacting user confidence in the system. Adopting semantic cache techniques usually requires testing the effectiveness of a semantic cache (accurate cache hits and misses) which requires a labelled test set of similar queries and responses which is often unavailable. In this paper, we present VaryGen, an approach for using LLMs for test input generation that produces similar questions from unstructured text documents. Our novel approach uses the reasoning capabilities of LLMs to 1) adapt queries to the domain, 2) synthesise subtle variations to queries, and 3) evaluate the synthesised test dataset. We evaluated our approach in the domain of a student question and answer system by qualitatively analysing 100 generated queries and result pairs, and conducting an empirical case study with an open source semantic cache. Our results show that query pairs satisfy human expectations of similarity and our generated data demonstrates failure cases of a semantic cache. Additionally, we also evaluate our approach on Qasper dataset. This work is an important first step into test input generation for semantic applications and presents considerations for practitioners when calibrating a semantic cache.
- [941] arXiv:2401.08185 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: DPAFNet:Dual Path Attention Fusion Network for Single Image DerainingAuthors: Bingcai WeiSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Rainy weather will have a significant impact on the regular operation of the imaging system. Based on this premise, image rain removal has always been a popular branch of low-level visual tasks, especially methods using deep neural networks. However, most neural networks are but-branched, such as only using convolutional neural networks or Transformers, which is unfavourable for the multidimensional fusion of image features. In order to solve this problem, this paper proposes a dual-branch attention fusion network. Firstly, a two-branch network structure is proposed. Secondly, an attention fusion module is proposed to selectively fuse the features extracted by the two branches rather than simply adding them. Finally, complete ablation experiments and sufficient comparison experiments prove the rationality and effectiveness of the proposed method.
- [942] arXiv:2401.08194 (cross-list from cs.CV) [ pdf , other ]
-
Title: End-to-End Optimized Image Compression with the Frequency-Oriented TransformComments: 25 pages, accepted by MVAPSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Abstract: Image compression constitutes a significant challenge amidst the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method could preserve semantic fidelity besides signal-level precision.
- [943] arXiv:2401.08221 (cross-list from cs.LG) [ pdf , other ]
-
Title: Towards Causal Relationship in Indefinite Data: Baseline Model and New DatasetsComments: If you are interested in the two new datasets, pls contact us by emailSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Integrating deep learning and causal discovery has encouraged us to spot that learning causal structures and representations in dialogue and video is full of challenges. We defined These data forms as "Indefinite Data", characterized by multi-structure data and multi-value representations. Unlike existing adaptable data forms, Indefinite Data still faces gaps in datasets and methods. To address the dataset gap, we release two high-quality datasets - Causalogue and Causaction, containing text dialogue samples and video action samples with causal annotations respectively. Moreover, the method gap arises from the coexistence of multi-structure data and multi-value representations, breaking the assumptions of all current methods and rendering them infeasible on Indefinite Data. To this end, we propose a probabilistic framework as a baseline, incorporating three designed highlights for this gap: 1) establishing Causation Condition of representations using the independence of noise terms under non-fixed causal structures, 2) treating causal strength as a latent variable and measuring the reconstruction loss in the correlation space, and 3) estimating the effects of latent confounders. These highpoints make the probabilistic model capable of overcoming challenges brought by the coexistence of multi-structure data and multi-value representations and pave the way for the extension of latent confounders. Comprehensive experiments have evaluated baseline results of causal structures, causal representations, and confounding disentanglement.
- [944] arXiv:2401.08255 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Generative Adversarial Attack for Multilingual Text ClassifiersComments: AAAI-24 Workshop on Artificial Intelligence for Cyber Security (AICS)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Current adversarial attack algorithms, where an adversary changes a text to fool a victim model, have been repeatedly shown to be effective against text classifiers. These attacks, however, generally assume that the victim model is monolingual and cannot be used to target multilingual victim models, a significant limitation given the increased use of these models. For this reason, in this work we propose an approach to fine-tune a multilingual paraphrase model with an adversarial objective so that it becomes able to generate effective adversarial examples against multilingual classifiers. The training objective incorporates a set of pre-trained models to ensure text quality and language consistency of the generated text. In addition, all the models are suitably connected to the generator by vocabulary-mapping matrices, allowing for full end-to-end differentiability of the overall training pipeline. The experimental validation over two multilingual datasets and five languages has shown the effectiveness of the proposed approach compared to existing baselines, particularly in terms of query efficiency. We also provide a detailed analysis of the generated attacks and discuss limitations and opportunities for future research.
- [945] arXiv:2401.08261 (cross-list from cs.CR) [ pdf , other ]
-
Title: Probabilistically Robust Watermarking of Neural NetworksSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: As deep learning (DL) models are widely and effectively used in Machine Learning as a Service (MLaaS) platforms, there is a rapidly growing interest in DL watermarking techniques that can be used to confirm the ownership of a particular model. Unfortunately, these methods usually produce watermarks susceptible to model stealing attacks. In our research, we introduce a novel trigger set-based watermarking approach that demonstrates resilience against functionality stealing attacks, particularly those involving extraction and distillation. Our approach does not require additional model training and can be applied to any model architecture. The key idea of our method is to compute the trigger set, which is transferable between the source model and the set of proxy models with a high probability. In our experimental study, we show that if the probability of the set being transferable is reasonably high, it can be effectively used for ownership verification of the stolen model. We evaluate our method on multiple benchmarks and show that our approach outperforms current state-of-the-art watermarking techniques in all considered experimental setups.
- [946] arXiv:2401.08273 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large Language Models are Null-Shot LearnersComments: 28 pages; added Gemini Pro results, error analysis, and a discussion on confabulationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper presents null-shot prompting. Null-shot prompting exploits hallucination in large language models (LLMs) by instructing LLMs to utilize information from the "Examples" section that never exists within the provided context to perform a task. While reducing hallucination is crucial and non-negligible for daily and critical uses of LLMs, we propose that in the current landscape in which these LLMs still hallucinate, it is possible, in fact, to exploit hallucination to increase performance in performing tasks compared to standard zero-shot prompting. Experiments with eight LLMs show improvements in performance across the majority of eight datasets, including reading comprehension, arithmetic reasoning, and closed-book question answering. The observed inconsistency in increased relative performance across the LLMs also potentially indicates a different degree of inherent hallucination in each model. These differences show that it is possible to utilize null-shot prompting as a way to detect degrees of hallucination in LLMs using existing benchmarking datasets. We also perform ablation studies, including experimenting with a modified version of null-shot prompting that incorporates ideas from zero-shot chain-of-thought prompting, which shows different trends of results.
- [947] arXiv:2401.08326 (cross-list from cs.CL) [ pdf , other ]
-
Title: RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool LearningAuthors: Junjie Ye , Yilong Wu , Songyang Gao , Caishuang Huang , Sixian Li , Guanyu Li , Xiaoran Fan , Qi Zhang , Tao Gui , Xuanjing HuangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Tool learning has generated widespread interest as a vital means of interaction between Large Language Models (LLMs) and the physical world. Current research predominantly emphasizes LLMs' capacity to utilize tools in well-structured environments while overlooking their stability when confronted with the inevitable noise of the real world. To bridge this gap, we introduce RoTBench, a multi-level benchmark for evaluating the robustness of LLMs in tool learning. Specifically, we establish five external environments, each featuring varying levels of noise (i.e., Clean, Slight, Medium, Heavy, and Union), providing an in-depth analysis of the model's resilience across three critical phases: tool selection, parameter identification, and content filling. Experiments involving six widely-used models underscore the urgent necessity for enhancing the robustness of LLMs in tool learning. For instance, the performance of GPT-4 even drops significantly from 80.00 to 58.10 when there is no substantial change in manual accuracy. More surprisingly, the noise correction capability inherent in the GPT family paradoxically impedes its adaptability in the face of mild noise. In light of these findings, we propose RoTTuning, a strategy that enriches the diversity of training environments to bolster the robustness of LLMs in tool learning. The code and data are available at this https URL .
- [948] arXiv:2401.08330 (cross-list from cs.LG) [ pdf , other ]
-
Title: Boosting Gradient Ascent for Continuous DR-submodular MaximizationAuthors: Qixin Zhang , Zongqi Wan , Zengde Deng , Zaiyi Chen , Xiaoming Sun , Jialin Zhang , Yu YangComments: 74 pages, 6 figures and 9 tables. An extended version of Stochastic Continuous Submodular Maximization: Boosting via Non-oblivious Function (ICML 2022)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: Projected Gradient Ascent (PGA) is the most commonly used optimization scheme in machine learning and operations research areas. Nevertheless, numerous studies and examples have shown that the PGA methods may fail to achieve the tight approximation ratio for continuous DR-submodular maximization problems. To address this challenge, we present a boosting technique in this paper, which can efficiently improve the approximation guarantee of the standard PGA to \emph{optimal} with only small modifications on the objective function. The fundamental idea of our boosting technique is to exploit non-oblivious search to derive a novel auxiliary function $F$, whose stationary points are excellent approximations to the global maximum of the original DR-submodular objective $f$. Specifically, when $f$ is monotone and $\gamma$-weakly DR-submodular, we propose an auxiliary function $F$ whose stationary points can provide a better $(1-e^{-\gamma})$-approximation than the $(\gamma^2/(1+\gamma^2))$-approximation guaranteed by the stationary points of $f$ itself. Similarly, for the non-monotone case, we devise another auxiliary function $F$ whose stationary points can achieve an optimal $\frac{1-\min_{\boldsymbol{x}\in\mathcal{C}}\|\boldsymbol{x}\|_{\infty}}{4}$-approximation guarantee where $\mathcal{C}$ is a convex constraint set. In contrast, the stationary points of the original non-monotone DR-submodular function can be arbitrarily bad~\citep{chen2023continuous}. Furthermore, we demonstrate the scalability of our boosting technique on four problems. In all of these four problems, our resulting variants of boosting PGA algorithm beat the previous standard PGA in several aspects such as approximation ratio and efficiency. Finally, we corroborate our theoretical findings with numerical experiments, which demonstrate the effectiveness of our boosting PGA methods.
- [949] arXiv:2401.08358 (cross-list from cs.CL) [ pdf , other ]
-
Title: Hallucination Detection and Hallucination Mitigation: An InvestigationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seemingly correct but factually incorrect responses. This report aims to present a comprehensive review of the current literature on both hallucination detection and hallucination mitigation. We hope that this report can serve as a good reference for both engineers and researchers who are interested in LLMs and applying them to real world tasks.
- [950] arXiv:2401.08376 (cross-list from cs.SE) [ pdf , other ]
-
Title: KADEL: Knowledge-Aware Denoising Learning for Commit Message GenerationComments: Accepted to ACM Transactions on Software Engineering and Methodology 2024 (TOSEM'24)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Commit messages are natural language descriptions of code changes, which are important for software evolution such as code understanding and maintenance. However, previous methods are trained on the entire dataset without considering the fact that a portion of commit messages adhere to good practice (i.e., good-practice commits), while the rest do not. On the basis of our empirical study, we discover that training on good-practice commits significantly contributes to the commit message generation. Motivated by this finding, we propose a novel knowledge-aware denoising learning method called KADEL. Considering that good-practice commits constitute only a small proportion of the dataset, we align the remaining training samples with these good-practice commits. To achieve this, we propose a model that learns the commit knowledge by training on good-practice commits. This knowledge model enables supplementing more information for training samples that do not conform to good practice. However, since the supplementary information may contain noise or prediction errors, we propose a dynamic denoising training method. This method composes a distribution-aware confidence function and a dynamic distribution list, which enhances the effectiveness of the training process. Experimental results on the whole MCMD dataset demonstrate that our method overall achieves state-of-the-art performance compared with previous methods. Our source code and data are available at this https URL
- [951] arXiv:2401.08383 (cross-list from cs.LG) [ pdf , other ]
-
Title: Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model InferenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: In large language models like the Generative Pre-trained Transformer, the Mixture of Experts paradigm has emerged as a powerful technique for enhancing model expressiveness and accuracy. However, deploying GPT MoE models for parallel inference on distributed systems presents significant challenges, primarily due to the extensive Alltoall communication required for expert routing and aggregation. This communication bottleneck exacerbates the already complex computational landscape, hindering the efficient utilization of high-performance computing resources. In this paper, we propose a lightweight optimization technique called ExFlow, to largely accelerate the inference of these MoE models. We take a new perspective on alleviating the communication overhead by exploiting the inter-layer expert affinity. Unlike previous methods, our solution can be directly applied to pre-trained MoE models without any fine-tuning or accuracy degradation. By proposing a context-coherent expert parallelism on distributed systems, our design only uses one Alltoall communication to deliver the same functionality while previous methods all require two Alltoalls. By carefully examining the conditional probability in tokens' routing across multiple layers, we proved that pre-trained GPT MoE models implicitly exhibit a strong inter-layer expert affinity. We then design an efficient integer programming model to capture such features and show that by properly placing the experts on corresponding GPUs, we can reduce up to 67% cross-GPU routing latency. Our solution beats the cutting-edge MoE implementations with experts from 8 to 64, with up to 2.2x improvement in inference throughput. We further provide a detailed study of how the model implicitly acquires this expert affinity at the very early training stage and how this affinity evolves and stabilizes during training.
- [952] arXiv:2401.08386 (cross-list from cs.LG) [ pdf , other ]
-
Title: Deep Learning-based Group Causal Inference in Multivariate Time-seriesComments: Accepted in AAAI24 (AI4TS)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Causal inference in a nonlinear system of multivariate timeseries is instrumental in disentangling the intricate web of relationships among variables, enabling us to make more accurate predictions and gain deeper insights into real-world complex systems. Causality methods typically identify the causal structure of a multivariate system by considering the cause-effect relationship of each pair of variables while ignoring the collective effect of a group of variables or interactions involving more than two-time series variables. In this work, we test model invariance by group-level interventions on the trained deep networks to infer causal direction in groups of variables, such as climate and ecosystem, brain networks, etc. Extensive testing with synthetic and real-world time series data shows a significant improvement of our method over other applied group causality methods and provides us insights into real-world time series. The code for our method can be found at: this https URL .
- [953] arXiv:2401.08396 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Hidden Flaws Behind Expert-Level Accuracy of GPT-4 Vision in MedicineAuthors: Qiao Jin , Fangyuan Chen , Yiliang Zhou , Ziyang Xu , Justin M. Cheung , Robert Chen , Ronald M. Summers , Justin F. Rousseau , Peiyun Ni , Marc J Landsman , Sally L. Baxter , Subhi J. Al'Aref , Yijia Li , Alex Chen , Josef A. Brejt , Michael F. Chiang , Yifan Peng , Zhiyong LuComments: Under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Recent studies indicate that Generative Pre-trained Transformer 4 with Vision (GPT-4V) outperforms human physicians in medical challenge tasks. However, these evaluations primarily focused on the accuracy of multi-choice questions alone. Our study extends the current scope by conducting a comprehensive analysis of GPT-4V's rationales of image comprehension, recall of medical knowledge, and step-by-step multimodal reasoning when solving New England Journal of Medicine (NEJM) Image Challenges - an imaging quiz designed to test the knowledge and diagnostic capabilities of medical professionals. Evaluation results confirmed that GPT-4V performs comparatively to human physicians regarding multi-choice accuracy (81.6% vs. 77.8%). GPT-4V also performs well in cases where physicians incorrectly answer, with over 78% accuracy. However, we discovered that GPT-4V frequently presents flawed rationales in cases where it makes the correct final choices (35.5%), most prominent in image comprehension (27.2%). Regardless of GPT-4V's high accuracy in multi-choice questions, our findings emphasize the necessity for further in-depth evaluations of its rationales before integrating such multimodal AI models into clinical workflows.
- [954] arXiv:2401.08397 (cross-list from cs.AR) [ pdf , other ]
-
Title: A Micro Architectural Events Aware Real-Time Embedded System Fault InjectorSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: In contemporary times, the increasing complexity of the system poses significant challenges to the reliability, trustworthiness, and security of the SACRES. Key issues include the susceptibility to phenomena such as instantaneous voltage spikes, electromagnetic interference, neutron strikes, and out-of-range temperatures. These factors can induce switch state changes in transistors, resulting in bit-flipping, soft errors, and transient corruption of stored data in memory. The occurrence of soft errors, in turn, may lead to system faults that can propel the system into a hazardous state. Particularly in critical sectors like automotive, avionics, or aerospace, such malfunctions can have real-world implications, potentially causing harm to individuals.
This paper introduces a novel fault injector designed to facilitate the monitoring, aggregation, and examination of micro-architectural events. This is achieved by harnessing the microprocessor's PMU and the debugging interface, specifically focusing on ensuring the repeatability of fault injections. The fault injection methodology targets bit-flipping within the memory system, affecting CPU registers and RAM. The outcomes of these fault injections enable a thorough analysis of the impact of soft errors and establish a robust correlation between the identified faults and the essential timing predictability demanded by SACRES. - [955] arXiv:2401.08405 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Interrogating AI: Characterizing Emergent Playful Interactions with ChatGPTComments: 17 pagesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: In an era of AI's growing capabilities and influences, recent advancements are reshaping HCI and CSCW's view of AI as mere tools. Playful interactions with AI systems naturally emerged as a way for users to make sense of the ever-changing technology. However, these emergent and playful interactions are underexamined. We target this gap by investigating playful interactions exhibited by users of a recently trending powerful AI technology, ChatGPT. Through a thematic analysis of 372 user-generated posts on the ChatGPT subreddit, we found that a substantial portion of user discourse revolves around playful interactions. The analysis further allowed us to construct a preliminary taxonomy to describe these interactions, categorizing them into six types: reflecting, jesting, imitating, challenging, tricking, and contriving; each included sub-categories. Overall, this study contributes to the field of HCI and CSCW by illuminating the multifaceted nature of playful interactions with AI, underlining their significance in shaping the human-AI relationship.
- [956] arXiv:2401.08425 (cross-list from cs.CV) [ pdf , other ]
-
Title: U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscriptsAuthors: Silvia Zottin , Axel De Nardin , Emanuela Colombi , Claudio Piciarelli , Filippo Pavan , Gian Luca ForestiComments: Neural Comput & Applic (2024)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Document Layout Analysis, which is the task of identifying different semantic regions inside of a document page, is a subject of great interest for both computer scientists and humanities scholars as it represents a fundamental step towards further analysis tasks for the former and a powerful tool to improve and facilitate the study of the documents for the latter. However, many of the works currently present in the literature, especially when it comes to the available datasets, fail to meet the needs of both worlds and, in particular, tend to lean towards the needs and common practices of the computer science side, leading to resources that are not representative of the humanities real needs. For this reason, the present paper introduces U-DIADS-Bib, a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities. Furthermore, we propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation, necessary for the generation of the ground truth segmentation maps. Finally, we present a standardized few-shot version of the dataset (U-DIADS-BibFS), with the aim of encouraging the development of models and solutions able to address this task with as few samples as possible, which would allow for more effective use in a real-world scenario, where collecting a large number of segmentations is not always feasible.
- [957] arXiv:2401.08429 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Machine Translation with Large Language Models: Prompt Engineering for Persian, English, and Russian DirectionsComments: 34 pages, 46 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Generative large language models (LLMs) have demonstrated exceptional proficiency in various natural language processing (NLP) tasks, including machine translation, question answering, text summarization, and natural language understanding.
To further enhance the performance of LLMs in machine translation, we conducted an investigation into two popular prompting methods and their combination, focusing on cross-language combinations of Persian, English, and Russian. We employed n-shot feeding and tailored prompting frameworks. Our findings indicate that multilingual LLMs like PaLM exhibit human-like machine translation outputs, enabling superior fine-tuning of desired translation nuances in accordance with style guidelines and linguistic considerations. These models also excel in processing and applying prompts. However, the choice of language model, machine translation task, and the specific source and target languages necessitate certain considerations when adopting prompting frameworks and utilizing n-shot in-context learning.
Furthermore, we identified errors and limitations inherent in popular LLMs as machine translation tools and categorized them based on various linguistic metrics. This typology of errors provides valuable insights for utilizing LLMs effectively and offers methods for designing prompts for in-context learning. Our report aims to contribute to the advancement of machine translation with LLMs by improving both the accuracy and reliability of evaluation metrics. - [958] arXiv:2401.08438 (cross-list from cs.CL) [ pdf , other ]
-
Title: CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Cognitive dynamics are pivotal to advance human understanding of the world. Recent advancements in large language models (LLMs) reveal their potential for cognitive simulation. However, these LLM-based cognitive studies primarily focus on static modeling, overlooking the dynamic nature of cognition. To bridge this gap, we propose the concept of the cognitive dynamics of LLMs and present a corresponding task with the inspiration of longitudinal studies. Towards the task, we develop CogBench, a novel benchmark to assess the cognitive dynamics of LLMs and validate it through participant surveys. We also design two evaluation metrics for CogBench, including Authenticity and Rationality. Recognizing the inherent static nature of LLMs, we introduce CogGPT for the task, which features an innovative iterative cognitive mechanism aimed at enhancing lifelong cognitive dynamics. Empirical results demonstrate the superiority of CogGPT over existing methods, particularly in its ability to facilitate role-specific cognitive dynamics under continuous information flows.
- [959] arXiv:2401.08458 (cross-list from cs.CR) [ pdf , other ]
-
Title: Security and Privacy Issues and Solutions in Federated Learning for Digital HealthcareJournal-ref: International Conference on Future Data and Security Engineering (2022) 316-331Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The advent of Federated Learning has enabled the creation of a high-performing model as if it had been trained on a considerable amount of data. A multitude of participants and a server cooperatively train a model without the need for data disclosure or collection. The healthcare industry, where security and privacy are paramount, can substantially benefit from this new learning paradigm, as data collection is no longer feasible due to stringent data policies. Nonetheless, unaddressed challenges and insufficient attack mitigation are hampering its adoption. Attack surfaces differ from traditional centralized learning in that the server and clients communicate between each round of training. In this paper, we thus present vulnerabilities, attacks, and defenses based on the widened attack surfaces, as well as suggest promising new research directions toward a more robust FL.
- [960] arXiv:2401.08478 (cross-list from cs.LG) [ pdf , other ]
-
Title: Solving Continual Offline Reinforcement Learning with Decision TransformerComments: 11 pages, 6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning, enabling agents to learn multiple tasks from static datasets without forgetting prior tasks. However, CORL faces challenges in balancing stability and plasticity. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We aim to investigate whether Decision Transformer (DT), another offline RL paradigm, can serve as a more suitable offline continuous learner to address these issues. We first compare AC-based offline algorithms with DT in the CORL framework. DT offers advantages in learning efficiency, distribution shift mitigation, and zero-shot generalization but exacerbates the forgetting problem during supervised parameter updates. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem. MH-DT stores task-specific knowledge using multiple heads, facilitating knowledge sharing with common components. It employs distillation and selective rehearsal to enhance current task learning when a replay buffer is available. In buffer-unavailable scenarios, LoRA-DT merges less influential weights and fine-tunes DT's decisive MLP layer to adapt to the current task. Extensive experiments on MoJuCo and Meta-World benchmarks demonstrate that our methods outperform SOTA CORL baselines and showcase enhanced learning capabilities and superior memory efficiency.
- [961] arXiv:2401.08505 (cross-list from cs.LG) [ pdf , other ]
-
Title: Harnessing Orthogonality to Train Low-Rank Neural NetworksAuthors: Daniel Coquelin , Katharina Flügel , Marie Weiel , Nicholas Kiefer , Charlotte Debus , Achim Streit , Markus GötzSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This study explores the learning dynamics of neural networks by analyzing the singular value decomposition (SVD) of their weights throughout training. Our investigation reveals that an orthogonal basis within each multidimensional weight's SVD representation stabilizes during training. Building upon this, we introduce Orthogonality-Informed Adaptive Low-Rank (OIALR) training, a novel training method exploiting the intrinsic orthogonality of neural networks. OIALR seamlessly integrates into existing training workflows with minimal accuracy loss, as demonstrated by benchmarking on various datasets and well-established network architectures. With appropriate hyperparameter tuning, OIALR can surpass conventional training setups, including those of state-of-the-art models.
- [962] arXiv:2401.08527 (cross-list from cs.CV) [ pdf , other ]
-
Title: MICA: Towards Explainable Skin Lesion Diagnosis via Multi-Level Image-Concept AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Black-box deep learning approaches have showcased significant potential in the realm of medical image analysis. However, the stringent trustworthiness requirements intrinsic to the medical field have catalyzed research into the utilization of Explainable Artificial Intelligence (XAI), with a particular focus on concept-based methods. Existing concept-based methods predominantly apply concept annotations from a single perspective (e.g., global level), neglecting the nuanced semantic relationships between sub-regions and concepts embedded within medical images. This leads to underutilization of the valuable medical information and may cause models to fall short in harmoniously balancing interpretability and performance when employing inherently interpretable architectures such as Concept Bottlenecks. To mitigate these shortcomings, we propose a multi-modal explainable disease diagnosis framework that meticulously aligns medical images and clinical-related concepts semantically at multiple strata, encompassing the image level, token level, and concept level. Moreover, our method allows for model intervention and offers both textual and visual explanations in terms of human-interpretable concepts. Experimental results on three skin image datasets demonstrate that our method, while preserving model interpretability, attains high performance and label efficiency for concept detection and disease diagnosis.
- [963] arXiv:2401.08534 (cross-list from cs.LG) [ pdf , other ]
-
Title: DiConStruct: Causal Concept-based Explanations through Black-Box DistillationAuthors: Ricardo Moreira , Jacopo Bono , Mário Cardoso , Pedro Saleiro , Mário A. T. Figueiredo , Pedro BizarroComments: Accepted at Conference on Causal Learning and Reasoning (CLeaR 2024, this https URL ). To be published at Proceedings of Machine Learning Research (PMLR)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predictive task. Despite the rapid advances in AI explainability in recent years, as far as we know to date, no method fulfills these three properties. Indeed, mainstream methods for local concept explainability do not produce causal explanations and incur a trade-off between explainability and prediction performance. We present DiConStruct, an explanation method that is both concept-based and causal, with the goal of creating more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Because of this, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.
- [964] arXiv:2401.08552 (cross-list from cs.LG) [ pdf , other ]
-
Title: Explaining Time Series via Contrastive and Locally Sparse PerturbationsAuthors: Zichuan Liu , Yingying Zhang , Tianchun Wang , Zefan Wang , Dongsheng Luo , Mengnan Du , Min Wu , Yi Wang , Chunlin Chen , Lunting Fan , Qingsong WenComments: Accepted by International Conference on Learning Representations (ICLR 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Explaining multivariate time series is a compound challenge, as it requires identifying important locations in the time series and matching complex temporal patterns. Although previous saliency-based methods addressed the challenges, their perturbation may not alleviate the distribution shift issue, which is inevitable especially in heterogeneous samples. We present ContraLSP, a locally sparse model that introduces counterfactual samples to build uninformative perturbations but keeps distribution using contrastive learning. Furthermore, we incorporate sample-specific sparse gates to generate more binary-skewed and smooth masks, which easily integrate temporal trends and select the salient features parsimoniously. Empirical studies on both synthetic and real-world datasets show that ContraLSP outperforms state-of-the-art models, demonstrating a substantial improvement in explanation quality for time series data. The source code is available at \url{ this https URL }.
- [965] arXiv:2401.08567 (cross-list from cs.LG) [ pdf , other ]
-
Title: Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal DataComments: Published at ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Building cross-modal applications is challenging due to limited paired multi-modal data. Recent works have shown that leveraging a pre-trained multi-modal contrastive representation space enables cross-modal tasks to be learned from uni-modal data. This is based on the assumption that contrastive optimization makes embeddings from different modalities interchangeable. However, this assumption is under-explored due to the poorly understood geometry of the multi-modal contrastive space, where a modality gap exists. In our study, we provide a theoretical explanation of this space's geometry and introduce a three-step method, $C^3$ (Connect, Collapse, Corrupt), to bridge the modality gap, enhancing the interchangeability of embeddings. Our $C^3$ method significantly improves cross-modal learning from uni-modal data, achieving state-of-the-art results on zero-shot image / audio / video captioning and text-to-image generation.
- [966] arXiv:2401.08577 (cross-list from cs.CV) [ pdf , other ]
-
Title: MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D WorldComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Human beings possess the capability to multiply a melange of multisensory cues while actively exploring and interacting with the 3D world. Current multi-modal large language models, however, passively absorb sensory data as inputs, lacking the capacity to actively interact with the objects in the 3D environment and dynamically collect their multisensory information. To usher in the study of this area, we propose MultiPLY, a multisensory embodied large language model that could incorporate multisensory interactive data, including visual, audio, tactile, and thermal information into large language models, thereby establishing the correlation among words, actions, and percepts. To this end, we first collect Multisensory Universe, a large-scale multisensory interaction dataset comprising 500k data by deploying an LLM-powered embodied agent to engage with the 3D environment. To perform instruction tuning with pre-trained LLM on such generated data, we first encode the 3D scene as abstracted object-centric representations and then introduce action tokens denoting that the embodied agent takes certain actions within the environment, as well as state tokens that represent the multisensory state observations of the agent at each time step. In the inference time, MultiPLY could generate action tokens, instructing the agent to take the action in the environment and obtain the next multisensory state observation. The observation is then appended back to the LLM via state tokens to generate subsequent text or action tokens. We demonstrate that MultiPLY outperforms baselines by a large margin through a diverse set of embodied tasks involving object retrieval, tool use, multisensory captioning, and task decomposition.
- [967] arXiv:2401.08579 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Curve-based Neural Style TransferSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: This research presents a new parametric style transfer framework specifically designed for curve-based design sketches. In this research, traditional challenges faced by neural style transfer methods in handling binary sketch transformations are effectively addressed through the utilization of parametric shape-editing rules, efficient curve-to-pixel conversion techniques, and the fine-tuning of VGG19 on ImageNet-Sketch, enhancing its role as a feature pyramid network for precise style extraction. By harmonizing intuitive curve-based imagery with rule-based editing, this study holds the potential to significantly enhance design articulation and elevate the practice of style transfer within the realm of product design.
- [968] arXiv:2401.08581 (cross-list from cs.CV) [ pdf , other ]
-
Title: Temporal Embeddings: Scalable Self-Supervised Temporal Representation Learning from Spatiotemporal Data for Multimodal Computer VisionComments: Extended abstract accepted for presentation at BayLearn 2023. 3 pages, 7 figures. Abstract based on IEEE IGARSS 2023 research track paper: arXiv:2304.13143Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: There exists a correlation between geospatial activity temporal patterns and type of land use. A novel self-supervised approach is proposed to stratify landscape based on mobility activity time series. First, the time series signal is transformed to the frequency domain and then compressed into task-agnostic temporal embeddings by a contractive autoencoder, which preserves cyclic temporal patterns observed in time series. The pixel-wise embeddings are converted to image-like channels that can be used for task-based, multimodal modeling of downstream geospatial tasks using deep semantic segmentation. Experiments show that temporal embeddings are semantically meaningful representations of time series data and are effective across different tasks such as classifying residential area and commercial areas. Temporal embeddings transform sequential, spatiotemporal motion trajectory data into semantically meaningful image-like tensor representations that can be combined (multimodal fusion) with other data modalities that are or can be transformed into image-like tensor representations (for e.g., RBG imagery, graph embeddings of road networks, passively collected imagery like SAR, etc.) to facilitate multimodal learning in geospatial computer vision. Multimodal computer vision is critical for training machine learning models for geospatial feature detection to keep a geospatial mapping service up-to-date in real-time and can significantly improve user experience and above all, user safety.
- [969] arXiv:2401.08584 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Nahid: AI-based Algorithm for operating fully-automatic surgeryAuthors: Sina SaadatiComments: 8 pages, 10 figures, 1 tableSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Robotics (cs.RO); Image and Video Processing (eess.IV)
Abstract: In this paper, for the first time, a method is presented that can provide a fully automated surgery based on software and computer vision techniques. Then, the advantages and challenges of computerization of medical surgery are examined. Finally, the surgery related to isolated ovarian endometriosis disease has been examined, and based on the presented method, a more detailed algorithm is presented that is capable of automatically diagnosing and treating this disease during surgery as proof of our proposed method where a U-net is trained to detect the endometriosis during surgery.
- [970] arXiv:2401.08587 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Automatic extraction and 3D reconstruction of split wire from point cloud data based on improved DPC algorithmAuthors: Jia ChengSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In order to solve the problem of point cloud data splitting improved by DPC algorithm, a research on automatic separation and 3D reconstruction of point cloud data split lines is proposed. First, the relative coordinates of each point in the cloud point are calculated. Second, it is planned to develop a relative ensemble-based DPC swarm algorithm for analyzing the number of separation lines to determine all parts in the cloud content. Finally, fit each separator using the least squares method. iron. The cloud point of the resulting split subconductors has a clear demarcation line, and the distance between adjacent split subconductors is 0.45 m, divided by the four vertices of the square.
- [971] arXiv:2401.08604 (cross-list from cs.CV) [ pdf , other ]
-
Title: SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic Segmentation in Intelligent VehiclesComments: 10 pages,9 figures,9 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Semantic segmentation plays a critical role in enabling intelligent vehicles to comprehend their surrounding environments. However, deep learning-based methods usually perform poorly in domain shift scenarios due to the lack of labeled data for training. Unsupervised domain adaptation (UDA) techniques have emerged to bridge the gap across different driving scenes and enhance model performance on unlabeled target environments. Although self-training UDA methods have achieved state-of-the-art results, the challenge of generating precise pseudo-labels persists. These pseudo-labels tend to favor majority classes, consequently sacrificing the performance of rare classes or small objects like traffic lights and signs. To address this challenge, we introduce SAM4UDASS, a novel approach that incorporates the Segment Anything Model (SAM) into self-training UDA methods for refining pseudo-labels. It involves Semantic-Guided Mask Labeling, which assigns semantic labels to unlabeled SAM masks using UDA pseudo-labels. Furthermore, we devise fusion strategies aimed at mitigating semantic granularity inconsistency between SAM masks and the target domain. SAM4UDASS innovatively integrate SAM with UDA for semantic segmentation in driving scenes and seamlessly complements existing self-training UDA methodologies. Extensive experiments on synthetic-to-real and normal-to-adverse driving datasets demonstrate its effectiveness. It brings more than 3% mIoU gains on GTA5-to-Cityscapes, SYNTHIA-to-Cityscapes, and Cityscapes-to-ACDC when using DAFormer and achieves SOTA when using MIC. The code will be available at this https URL .
- [972] arXiv:2401.08619 (cross-list from cs.LG) [ pdf , other ]
-
Title: MATE-Pred: Multimodal Attention-based TCR-Epitope interaction PredictorComments: Patent pending: U.S. Provisional Application No. 63/603,952Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: An accurate binding affinity prediction between T-cell receptors and epitopes contributes decisively to develop successful immunotherapy strategies. Some state-of-the-art computational methods implement deep learning techniques by integrating evolutionary features to convert the amino acid residues of cell receptors and epitope sequences into numerical values, while some other methods employ pre-trained language models to summarize the embedding vectors at the amino acid residue level to obtain sequence-wise representations.
Here, we propose a highly reliable novel method, MATE-Pred, that performs multi-modal attention-based prediction of T-cell receptors and epitopes binding affinity. The MATE-Pred is compared and benchmarked with other deep learning models that leverage multi-modal representations of T-cell receptors and epitopes. In the proposed method, the textual representation of proteins is embedded with a pre-trained bi-directional encoder model and combined with two additional modalities: a) a comprehensive set of selected physicochemical properties; b) predicted contact maps that estimate the 3D distances between amino acid residues in the sequences.
The MATE-Pred demonstrates the potential of multi-modal model in achieving state-of-the-art performance (+8.4\% MCC, +5.5\% AUC compared to baselines) and efficiently capturing contextual, physicochemical, and structural information from amino acid residues. The performance of MATE-Pred projects its potential application in various drug discovery regimes. - [973] arXiv:2401.08623 (cross-list from cs.NE) [ pdf , other ]
-
Title: Wake-Sleep Consolidated LearningAuthors: Amelia Sorrenti , Giovanni Bellitto , Federica Proietto Salanitri , Matteo Pennisi , Simone Palazzo , Concetto SpampinatoSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: We propose Wake-Sleep Consolidated Learning (WSCL), a learning strategy leveraging Complementary Learning System theory and the wake-sleep phases of the human brain to improve the performance of deep neural networks for visual classification tasks in continual learning settings. Our method learns continually via the synchronization between distinct wake and sleep phases. During the wake phase, the model is exposed to sensory input and adapts its representations, ensuring stability through a dynamic parameter freezing mechanism and storing episodic memories in a short-term temporary memory (similarly to what happens in the hippocampus). During the sleep phase, the training process is split into NREM and REM stages. In the NREM stage, the model's synaptic weights are consolidated using replayed samples from the short-term and long-term memory and the synaptic plasticity mechanism is activated, strengthening important connections and weakening unimportant ones. In the REM stage, the model is exposed to previously-unseen realistic visual sensory experience, and the dreaming process is activated, which enables the model to explore the potential feature space, thus preparing synapses to future knowledge. We evaluate the effectiveness of our approach on three benchmark datasets: CIFAR-10, Tiny-ImageNet and FG-ImageNet. In all cases, our method outperforms the baselines and prior work, yielding a significant performance gain on continual visual classification tasks. Furthermore, we demonstrate the usefulness of all processing stages and the importance of dreaming to enable positive forward transfer.
- [974] arXiv:2401.08632 (cross-list from cs.NE) [ pdf , other ]
-
Title: Synergizing Quality-Diversity with Descriptor-Conditioned Reinforcement LearningComments: arXiv admin note: substantial text overlap with arXiv:2303.03832Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: A fundamental trait of intelligence involves finding novel and creative solutions to address a given challenge or to adapt to unforeseen situations. Reflecting this, Quality-Diversity optimization is a family of Evolutionary Algorithms, that generates collections of both diverse and high-performing solutions. Among these, MAP-Elites is a prominent example, that has been successfully applied to a variety of domains, including evolutionary robotics. However, MAP-Elites performs a divergent search with random mutations originating from Genetic Algorithms, and thus, is limited to evolving populations of low-dimensional solutions. PGA-MAP-Elites overcomes this limitation using a gradient-based variation operator inspired by deep reinforcement learning which enables the evolution of large neural networks. Although high-performing in many environments, PGA-MAP-Elites fails on several tasks where the convergent search of the gradient-based variation operator hinders diversity. In this work, we present three contributions: (1) we enhance the Policy Gradient variation operator with a descriptor-conditioned critic that reconciles diversity search with gradient-based methods, (2) we leverage the actor-critic training to learn a descriptor-conditioned policy at no additional cost, distilling the knowledge of the population into one single versatile policy that can execute a diversity of behaviors, (3) we exploit the descriptor-conditioned actor by injecting it in the population, despite network architecture differences. Our method, DCG-MAP-Elites, achieves equal or higher QD score and coverage compared to all baselines on seven challenging continuous control locomotion tasks.
- [975] arXiv:2401.08636 (cross-list from cs.DC) [ pdf , other ]
-
Title: MLCommons Cloud Masking Benchmark with Early StoppingAuthors: Varshitha Chennamsetti , Gregor von Laszewski , Ruochen Gu , Laiba Mehnaz , Juri Papay , Samuel Jackson , Jeyan Thiyagalingam , Sergey V. Samsonau , Geoffrey C. FoxComments: 11 pages, 9 figures, 3 tables. arXiv admin note: text overlap with arXiv:2312.04799Subjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop. We provide a description of the cloud masking benchmark, as well as a summary of our submission to MLCommons on the benchmark experiment we conducted. It includes a modification to the reference implementation of the cloud masking benchmark enabling early stopping. This benchmark is executed on the NYU HPC through a custom batch script that runs the various experiments through the batch queuing system while allowing for variation on the number of epochs trained. Our submission includes the modified code, a custom batch script to modify epochs, documentation, and the benchmark results. We report the highest accuracy (scientific metric) and the average time taken (performance metric) for training and inference that was achieved on NYU HPC Greene. We also provide a comparison of the compute capabilities between different systems by running the benchmark for one epoch. Our submission can be found in a Globus repository that is accessible to MLCommons Science Working Group.
- [976] arXiv:2401.08655 (cross-list from cs.CV) [ pdf , other ]
-
Title: SAiD: Speech-driven Blendshape Facial Animation with DiffusionComments: Fix bug related to the font sizeSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: Speech-driven 3D facial animation is challenging due to the scarcity of large-scale visual-audio datasets despite extensive research. Most prior works, typically focused on learning regression models on a small dataset using the method of least squares, encounter difficulties generating diverse lip movements from speech and require substantial effort in refining the generated outputs. To address these issues, we propose a speech-driven 3D facial animation with a diffusion model (SAiD), a lightweight Transformer-based U-Net with a cross-modality alignment bias between audio and visual to enhance lip synchronization. Moreover, we introduce BlendVOCA, a benchmark dataset of pairs of speech audio and parameters of a blendshape facial model, to address the scarcity of public resources. Our experimental results demonstrate that the proposed approach achieves comparable or superior performance in lip synchronization to baselines, ensures more diverse lip movements, and streamlines the animation editing process.
- [977] arXiv:2401.08658 (cross-list from cs.RO) [ pdf , other ]
-
Title: End-To-End Planning of Autonomous Driving in Industry and Academia: 2022-2023Comments: 8 pages, 14 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: This paper aims to provide a quick review of the methods including the technologies in detail that are currently reported in industry and academia. Specifically, this paper reviews the end-to-end planning, including Tesla FSD V12, Momenta 2023, Horizon Robotics 2023, Motional RoboTaxi 2022, Woven Planet (Toyota): Urban Driver, and Nvidia. In addition, we review the state-of-the-art academic studies that investigate end-to-end planning of autonomous driving. This paper provides readers with a concise structure and fast learning of state-of-the-art end-to-end planning for 2022-2023. This article provides a meaningful overview as introductory material for beginners to follow the state-of-the-art end-to-end planning of autonomous driving in industry and academia, as well as supplementary material for advanced researchers.
- [978] arXiv:2401.08669 (cross-list from cs.LG) [ pdf , other ]
-
Title: Deep Reinforcement Learning for Multi-Truck Vehicle Routing Problems with Multi-Leg Demand RoutesComments: 13 pages, 4 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep reinforcement learning (RL) has been shown to be effective in producing approximate solutions to some vehicle routing problems (VRPs), especially when using policies generated by encoder-decoder attention mechanisms. While these techniques have been quite successful for relatively simple problem instances, there are still under-researched and highly complex VRP variants for which no effective RL method has been demonstrated. In this work we focus on one such VRP variant, which contains multiple trucks and multi-leg routing requirements. In these problems, demand is required to move along sequences of nodes, instead of just from a start node to an end node. With the goal of making deep RL a viable strategy for real-world industrial-scale supply chain logistics, we develop new extensions to existing encoder-decoder attention models which allow them to handle multiple trucks and multi-leg routing requirements. Our models have the advantage that they can be trained for a small number of trucks and nodes, and then embedded into a large supply chain to yield solutions for larger numbers of trucks and nodes. We test our approach on a real supply chain environment arising in the operations of Japanese automotive parts manufacturer Aisin Corporation, and find that our algorithm outperforms Aisin's previous best solution.
- [979] arXiv:2401.08672 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Concept AlignmentAuthors: Sunayana Rane , Polyphony J. Bruna , Ilia Sucholutsky , Christopher Kello , Thomas L. GriffithsComments: NeurIPS MP2 Workshop 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
Abstract: Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is imperative that AI systems and humans align the concepts they use to understand the world. We integrate ideas from philosophy, cognitive science, and deep learning to explain the need for concept alignment, not just value alignment, between humans and machines. We summarize existing accounts of how humans and machines currently learn concepts, and we outline opportunities and challenges in the path towards shared concepts. Finally, we explain how we can leverage the tools already being developed in cognitive science and AI research to accelerate progress towards concept alignment.
- [980] arXiv:2401.08683 (cross-list from cs.AR) [ pdf , other ]
-
Title: Zero-Shot RTL Code Generation with Attention Sink Augmented Large Language ModelsSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE)
Abstract: The design and optimization of hardware have traditionally been resource-intensive, demanding considerable expertise and dependence on established design automation tools. This paper discusses the possibility of exploiting large language models to streamline the code generation process in hardware design. In contrast to earlier studies, this paper aims to use large language models that accepts high-level design specifications through a single prompt to generate corresponding Register-Transfer Level (RTL) code. The ability to use large language models on RTL code generation not only expedites design iteration cycles but also facilitates the exploration of design spaces that have computational challenges for conventional techniques. Through our evaluation, we demonstrate the shortcoming of existing attention mechanisms, and present the abilities of language models to produce functional, optimized, and industry-standard compliant RTL code when a novel attention mechanism is used. These findings underscore the expanding role of large language models in shaping the future landscape of architectural exploration and automation in hardware design.
- [981] arXiv:2401.08694 (cross-list from cs.CL) [ pdf , other ]
-
Title: Combining Confidence Elicitation and Sample-based Methods for Uncertainty Quantification in Misinformation MitigationComments: 12 pages, 11 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models have emerged as prime candidates to tackle misinformation mitigation. However, existing approaches struggle with hallucinations and overconfident predictions. We propose an uncertainty quantification framework that leverages both direct confidence elicitation and sampled-based consistency methods to provide better calibration for NLP misinformation mitigation solutions. We first investigate the calibration of sample-based consistency methods that exploit distinct features of consistency across sample sizes and stochastic levels. Next, we evaluate the performance and distributional shift of a robust numeric verbalization prompt across single vs. two-step confidence elicitation procedure. We also compare the performance of the same prompt with different versions of GPT and different numerical scales. Finally, we combine the sample-based consistency and verbalized methods to propose a hybrid framework that yields a better uncertainty estimation for GPT models. Overall, our work proposes novel uncertainty quantification methods that will improve the reliability of Large Language Models in misinformation mitigation applications.
- [982] arXiv:2401.08696 (cross-list from cs.AR) [ pdf , other ]
-
Title: Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis with GNNsComments: Accepted for publication at DATE 2024Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: High-level synthesis (HLS) notably speeds up the hardware design process by avoiding RTL programming. However, the turnaround time of HLS increases significantly when post-route quality of results (QoR) are considered during optimization. To tackle this issue, we propose a hierarchical post-route QoR prediction approach for FPGA HLS, which features: (1) a modeling flow that directly estimates latency and post-route resource usage from C/C++ programs; (2) a graph construction method that effectively represents the control and data flow graph of source code and effects of HLS pragmas; and (3) a hierarchical GNN training and prediction method capable of capturing the impact of loop hierarchies. Experimental results show that our method presents a prediction error of less than 10% for different types of QoR metrics, which gains tremendous improvement compared with the state-of-the-art GNN methods. By adopting our proposed methodology, the runtime for design space exploration in HLS is shortened to tens of minutes and the achieved ADRS is reduced to 6.91% on average.
- [983] arXiv:2401.08711 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Assistant, Parrot, or Colonizing Loudspeaker? ChatGPT Metaphors for Developing Critical AI LiteraciesComments: This is a preprint (accepted version) of an article that has been accepted for publication at the journal Open Praxis: this https URLSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: This study explores how discussing metaphors for AI can help build awareness of the frames that shape our understanding of AI systems, particularly large language models (LLMs) like ChatGPT. Given the pressing need to teach "critical AI literacy", discussion of metaphor provides an opportunity for inquiry and dialogue with space for nuance, playfulness, and critique. Using a collaborative autoethnographic methodology, we analyzed metaphors from a range of sources, and reflected on them individually according to seven questions, then met and discussed our interpretations. We then analyzed how our reflections contributed to the three kinds of literacies delineated in Selber's multiliteracies framework: functional, critical, and rhetorical. These allowed us to analyze questions of ethics, equity, and accessibility in relation to AI. We explored each metaphor along the dimension of whether or not it was promoting anthropomorphizing, and to what extent such metaphors imply that AI is sentient. Our findings highlight the role of metaphor reflection in fostering a nuanced understanding of AI, suggesting that our collaborative autoethnographic approach as well as the heuristic model of plotting AI metaphors on dimensions of anthropomorphism and multiliteracies, might be useful for educators and researchers in the pursuit of advancing critical AI literacy.
- [984] arXiv:2401.08714 (cross-list from cs.HC) [ pdf , other ]
-
Title: Training program on sign language: social inclusion through Virtual Reality in ISENSE projectAuthors: Alessia Bisio , Enrique Yeguas-Bolívar , Pilar Aparicio-Martínez , María Dolores Redel-Macías , Sara Pinzi , Stefano Rossi , Juri TaborriComments: 6 pages, 4 figures, MetroXRAINE 2023 Conference, ISENSE european projectJournal-ref: 2023 IEEE International Conference on Metrology for Extended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Milano, Italy, 2023, pp. 104-109Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Abstract: Structured hand gestures that incorporate visual motions and signs are used in sign language. Sign language is a valuable means of daily communication for individuals who are deaf or have speech impairments, but it is still rare among hearing people, and fewer are capable of understand it. Within the academic context, parents and teachers play a crucial role in supporting deaf students from childhood by facilitating their learning of sign language. In the last years, among all the teaching tools useful for learning sign language, the use of Virtual Reality (VR) has increased, as it has been demonstrated to improve retention, memory and attention during the learning process. The ISENSE project has been created to assist students with deafness during their academic life by proposing different technological tools for teaching sign language to the hearing community in the academic context. As part of the ISENSE project, this work aims to develop an application for Spanish and Italian sign language recognition that exploits the VR environment to quickly and easily create a comprehensive database of signs and an Artificial Intelligence (AI)-based software to accurately classify and recognize static and dynamic signs: from letters to sentences.
- [985] arXiv:2401.08715 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Selecting Subsets of Source Data for Transfer Learning with Applications in Metal Additive ManufacturingComments: 26 pages, 9 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Considering data insufficiency in metal additive manufacturing (AM), transfer learning (TL) has been adopted to extract knowledge from source domains (e.g., completed printings) to improve the modeling performance in target domains (e.g., new printings). Current applications use all accessible source data directly in TL with no regard to the similarity between source and target data. This paper proposes a systematic method to find appropriate subsets of source data based on similarities between the source and target datasets for a given set of limited target domain data. Such similarity is characterized by the spatial and model distance metrics. A Pareto frontier-based source data selection method is developed, where the source data located on the Pareto frontier defined by two similarity distance metrics are selected iteratively. The method is integrated into an instance-based TL method (decision tree regression model) and a model-based TL method (fine-tuned artificial neural network). Both models are then tested on several regression tasks in metal AM. Comparison results demonstrate that 1) the source data selection method is general and supports integration with various TL methods and distance metrics, 2) compared with using all source data, the proposed method can find a small subset of source data from the same domain with better TL performance in metal AM regression tasks involving different processes and machines, and 3) when multiple source domains exist, the source data selection method could find the subset from one source domain to obtain comparable or better TL performance than the model constructed using data from all source domains.
- [986] arXiv:2401.08721 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: A Telerehabilitation System for the Selection, Evaluation and Remote Management of TherapiesJournal-ref: Sensors 18(5): 1459 (2018)Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Telerehabilitation systems that support physical therapy sessions anywhere can help save healthcare costs while also improving the quality of life of the users that need rehabilitation. The main contribution of this paper is to present, as a whole, all the features supported by the innovative Kinect-based Telerehabilitation System (KiReS). In addition to the functionalities provided by current systems, it handles two new ones that could be incorporated into them, in order to give a step forward towards a new generation of telerehabilitation systems. The knowledge extraction functionality handles knowledge about the physical therapy record of patients and treatment protocols described in an ontology, named TRHONT, to select the adequate exercises for the rehabilitation of patients. The teleimmersion functionality provides a convenient, effective and user-friendly experience when performing the telerehabilitation, through a two-way real-time multimedia communication. The ontology contains about 2300 classes and 100 properties, and the system allows a reliable transmission of Kinect video depth, audio and skeleton data, being able to adapt to various network conditions. Moreover, the system has been tested with patients who suffered from shoulder disorders or total hip replacement.
- [987] arXiv:2401.08727 (cross-list from cs.LG) [ pdf , other ]
-
Title: MA2GCN: Multi Adjacency relationship Attention Graph Convolutional Networks for Traffic Prediction using Trajectory dataSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The problem of traffic congestion not only causes a large amount of economic losses, but also seriously endangers the urban environment. Predicting traffic congestion has important practical significance. So far, most studies have been based on historical data from sensors placed on different roads to predict future traffic flow and speed, to analyze the traffic congestion conditions of a certain road segment. However, due to the fixed position of sensors, it is difficult to mine new information. On the other hand, vehicle trajectory data is more flexible and can extract traffic information as needed. Therefore, we proposed a new traffic congestion prediction model - Multi Adjacency relationship Attention Graph Convolutional Networks(MA2GCN). This model transformed vehicle trajectory data into graph structured data in grid form, and proposed a vehicle entry and exit matrix based on the mobility between different grids. At the same time, in order to improve the performance of the model, this paper also built a new adaptive adjacency matrix generation method and adjacency matrix attention module. This model mainly used gated temporal convolution and graph convolution to extract temporal and spatial information, respectively. Compared with multiple baselines, our model achieved the best performance on Shanghai taxi GPS trajectory dataset. The code is available at this https URL .
- [988] arXiv:2401.08728 (cross-list from cs.MA) [ pdf , other ]
-
Title: AgentMixer: Multi-Agent Correlated Policy FactorizationSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: Centralized training with decentralized execution (CTDE) is widely employed to stabilize partially observable multi-agent reinforcement learning (MARL) by utilizing a centralized value function during training. However, existing methods typically assume that agents make decisions based on their local observations independently, which may not lead to a correlated joint policy with sufficient coordination. Inspired by the concept of correlated equilibrium, we propose to introduce a \textit{strategy modification} to provide a mechanism for agents to correlate their policies. Specifically, we present a novel framework, AgentMixer, which constructs the joint fully observable policy as a non-linear combination of individual partially observable policies. To enable decentralized execution, one can derive individual policies by imitating the joint policy. Unfortunately, such imitation learning can lead to \textit{asymmetric learning failure} caused by the mismatch between joint policy and individual policy information. To mitigate this issue, we jointly train the joint policy and individual policies and introduce \textit{Individual-Global-Consistency} to guarantee mode consistency between the centralized and decentralized policies. We then theoretically prove that AgentMixer converges to an $\epsilon$-approximate Correlated Equilibrium. The strong experimental performance on three MARL benchmarks demonstrates the effectiveness of our method.
- [989] arXiv:2401.08739 (cross-list from cs.CV) [ pdf , other ]
-
Title: EgoGen: An Egocentric Synthetic Data GeneratorAuthors: Gen Li , Kaifeng Zhao , Siwei Zhang , Xiaozhong Lyu , Mihai Dusmanu , Yan Zhang , Marc Pollefeys , Siyu TangComments: Accepted by CVPR 2024 (Oral). 23 pages, 17 figures. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Understanding the world in first-person view is fundamental in Augmented Reality (AR). This immersive perspective brings dramatic visual changes and unique challenges compared to third-person views. Synthetic data has empowered third-person-view vision models, but its application to embodied egocentric perception tasks remains largely unexplored. A critical challenge lies in simulating natural human movements and behaviors that effectively steer the embodied cameras to capture a faithful egocentric representation of the 3D world. To address this challenge, we introduce EgoGen, a new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks. At the heart of EgoGen is a novel human motion synthesis model that directly leverages egocentric visual inputs of a virtual human to sense the 3D environment. Combined with collision-avoiding motion primitives and a two-stage reinforcement learning approach, our motion synthesis model offers a closed-loop solution where the embodied perception and movement of the virtual human are seamlessly coupled. Compared to previous works, our model eliminates the need for a pre-defined global path, and is directly applicable to dynamic environments. Combined with our easy-to-use and scalable data generation pipeline, we demonstrate EgoGen's efficacy in three tasks: mapping and localization for head-mounted cameras, egocentric camera tracking, and human mesh recovery from egocentric views. EgoGen will be fully open-sourced, offering a practical solution for creating realistic egocentric training data and aiming to serve as a useful tool for egocentric computer vision research. Refer to our project page: this https URL .
- [990] arXiv:2401.08741 (cross-list from cs.CV) [ pdf , other ]
-
Title: Fixed Point Diffusion ModelsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce the Fixed Point Diffusion Model (FPDM), a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling. Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems. Combined with a new stochastic training method, this approach significantly reduces model size, reduces memory usage, and accelerates training. Moreover, it enables the development of two new techniques to improve sampling efficiency: reallocating computation across timesteps and reusing fixed point solutions between timesteps. We conduct extensive experiments with state-of-the-art models on ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, demonstrating substantial improvements in performance and efficiency. Compared to the state-of-the-art DiT model, FPDM contains 87% fewer parameters, consumes 60% less memory during training, and improves image generation quality in situations where sampling computation or time is limited. Our code and pretrained models are available at this https URL .
- [991] arXiv:2401.08815 (cross-list from cs.CV) [ pdf , other ]
-
Title: Adversarial Supervision Makes Layout-to-Image Diffusion Models ThriveSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Despite the recent advances in large-scale diffusion models, little progress has been made on the layout-to-image (L2I) synthesis task. Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout. This limits their usability in practice. To mitigate this, we propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM). Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout. To encourage consistent adherence to the input layout over the sampling steps, we further introduce the multistep unrolling strategy. Instead of looking at a single timestep, we unroll a few steps recursively to imitate the inference process, and ask the discriminator to assess the alignment of denoised images with the layout over a certain time window. Our experiments show that ALDM enables layout faithfulness of the generated images, while allowing broad editability via text prompts. Moreover, we showcase its usefulness for practical applications: by synthesizing target distribution samples via text control, we improve domain generalization of semantic segmentation models by a large margin (~12 mIoU points).
- [992] arXiv:2401.08819 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learning from Sparse Offline Datasets via Conservative Density EstimationComments: ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Offline reinforcement learning (RL) offers a promising direction for learning policies from pre-collected datasets without requiring further interactions with the environment. However, existing methods struggle to handle out-of-distribution (OOD) extrapolation errors, especially in sparse reward or scarce data settings. In this paper, we propose a novel training algorithm called Conservative Density Estimation (CDE), which addresses this challenge by explicitly imposing constraints on the state-action occupancy stationary distribution. CDE overcomes the limitations of existing approaches, such as the stationary distribution correction method, by addressing the support mismatch issue in marginal importance sampling. Our method achieves state-of-the-art performance on the D4RL benchmark. Notably, CDE consistently outperforms baselines in challenging tasks with sparse rewards or insufficient data, demonstrating the advantages of our approach in addressing the extrapolation error problem in offline RL.
- [993] arXiv:2401.08850 (cross-list from cs.LG) [ pdf , other ]
-
Title: REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision ProcessesComments: ICLR camera ready versionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Discrete-action reinforcement learning algorithms often falter in tasks with high-dimensional discrete action spaces due to the vast number of possible actions. A recent advancement leverages value-decomposition, a concept from multi-agent reinforcement learning, to tackle this challenge. This study delves deep into the effects of this value-decomposition, revealing that whilst it curtails the over-estimation bias inherent to Q-learning algorithms, it amplifies target variance. To counteract this, we present an ensemble of critics to mitigate target variance. Moreover, we introduce a regularisation loss that helps to mitigate the effects that exploratory actions in one dimension can have on the value of optimal actions in other dimensions. Our novel algorithm, REValueD, tested on discretised versions of the DeepMind Control Suite tasks, showcases superior performance, especially in the challenging humanoid and dog tasks. We further dissect the factors influencing REValueD's performance, evaluating the significance of the regularisation loss and the scalability of REValueD with increasing sub-actions per dimension.
- [994] arXiv:2401.08863 (cross-list from cs.LG) [ pdf , other ]
-
Title: Robust Localization of Key Fob Using Channel Impulse Response of Ultra Wide Band Sensors for Keyless Entry SystemsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Using neural networks for localization of key fob within and surrounding a car as a security feature for keyless entry is fast emerging. In this paper we study: 1) the performance of pre-computed features of neural networks based UWB (ultra wide band) localization classification forming the baseline of our experiments. 2) Investigate the inherent robustness of various neural networks; therefore, we include the study of robustness of the adversarial examples without any adversarial training in this work. 3) Propose a multi-head self-supervised neural network architecture which outperforms the baseline neural networks without any adversarial training. The model's performance improved by 67% at certain ranges of adversarial magnitude for fast gradient sign method and 37% each for basic iterative method and projected gradient descent method.
- [995] arXiv:2401.08875 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: DCRMTA: Unbiased Causal Representation for Multi-touch AttributionAuthors: Jiaming TangComments: 9 pages, 9 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: Multi-touch attribution (MTA) currently plays a pivotal role in achieving a fair estimation of the contributions of each advertising touchpoint to-wards conversion behavior, deeply influencing budget allocation and advertising recommenda-tion. Previous works attempted to eliminate the bias caused by user preferences to achieve the unbiased assumption of the conversion model. The multi-model collaboration method is not ef-ficient, and the complete elimination of user in-fluence also eliminates the causal effect of user features on conversion, resulting in limited per-formance of the conversion model. This paper re-defines the causal effect of user features on con-versions and proposes a novel end-to-end ap-proach, Deep Causal Representation for MTA (DCRMTA). Our model focuses on extracting causa features between conversions and users while eliminating confounding variables. Fur-thermore, extensive experiments demonstrate DCRMTA's superior performance in converting prediction across varying data distributions, while also effectively attributing value across dif-ferent advertising channels.
- [996] arXiv:2401.08887 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting TranscriptionAuthors: Alon Vinnikov , Amir Ivry , Aviv Hurvitz , Igor Abramovski , Sharon Koubi , Ilya Gurvich , Shai Pe`er , Xiong Xiao , Benjamin Martinez Elizalde , Naoyuki Kanda , Xiaofei Wang , Shalev Shaer , Stav Yagev , Yossi Asher , Sunit Sivasankaran , Yifan Gong , Min Tang , Huaming Wang , Eyal KrupkaComments: preprintSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Abstract: We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics. It is recorded across 30 conference rooms, featuring 4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This is aligned with common setups in actual conference rooms, and avoids technical complexities associated with multi-device tasks. It also allows for the development of geometry-specific solutions. The NOTSOFAR-1 Challenge aims to advance research in the field of distant conversational speech recognition, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmarking datasets.
- [997] arXiv:2401.08897 (cross-list from cs.LG) [ pdf , other ]
-
Title: CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoderComments: 21 pages, 14 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Symmetries of input and latent vectors have provided valuable insights for disentanglement learning in VAEs.However, only a few works were proposed as an unsupervised method, and even these works require known factor information in training data. We propose a novel method, Composite Factor-Aligned Symmetry Learning (CFASL), which is integrated into VAEs for learning symmetry-based disentanglement in unsupervised learning without any knowledge of the dataset factor information.CFASL incorporates three novel features for learning symmetry-based disentanglement: 1) Injecting inductive bias to align latent vector dimensions to factor-aligned symmetries within an explicit learnable symmetry codebook 2) Learning a composite symmetry to express unknown factors change between two random samples by learning factor-aligned symmetries within the codebook 3) Inducing group equivariant encoder and decoder in training VAEs with the two conditions. In addition, we propose an extended evaluation metric for multi-factor changes in comparison to disentanglement evaluation in VAEs. In quantitative and in-depth qualitative analysis, CFASL demonstrates a significant improvement of disentanglement in single-factor change, and multi-factor change conditions compared to state-of-the-art methods.
- [998] arXiv:2401.08898 (cross-list from cs.LG) [ pdf , other ]
-
Title: Bridging State and History Representations: Understanding Self-Predictive RLAuthors: Tianwei Ni , Benjamin Eysenbach , Erfan Seyedsalehi , Michel Ma , Clement Gehring , Aditya Mahajan , Pierre-Luc BaconComments: ICLR 2024 (Poster). Code is available at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of preliminary guidelines for RL practitioners.
- [999] arXiv:2401.08930 (cross-list from cs.CV) [ pdf , other ]
-
Title: 3D Human Pose Analysis via Diffusion SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Diffusion models have demonstrated remarkable success in generative modeling. In this paper, we propose PADS (Pose Analysis by Diffusion Synthesis), a novel framework designed to address various challenges in 3D human pose analysis through a unified pipeline. Central to PADS are two distinctive strategies: i) learning a task-agnostic pose prior using a diffusion synthesis process to effectively capture the kinematic constraints in human pose data, and ii) unifying multiple pose analysis tasks like estimation, completion, denoising, etc, as instances of inverse problems. The learned pose prior will be treated as a regularization imposing on task-specific constraints, guiding the optimization process through a series of conditional denoising steps. PADS represents the first diffusion-based framework for tackling general 3D human pose analysis within the inverse problem framework. Its performance has been validated on different benchmarks, signaling the adaptability and robustness of this pipeline.
- [1000] arXiv:2401.08932 (cross-list from cs.CV) [ pdf , other ]
-
Title: Learning to detect cloud and snow in remote sensing images from noisy labelsAuthors: Zili Liu , Hao Chen , Wenyuan Li , Keyan Chen , Zipeng Qi , Chenyang Liu , Zhengxia Zou , Zhenwei ShiSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Detecting clouds and snow in remote sensing images is an essential preprocessing task for remote sensing imagery. Previous works draw inspiration from semantic segmentation models in computer vision, with most research focusing on improving model architectures to enhance detection performance. However, unlike natural images, the complexity of scenes and the diversity of cloud types in remote sensing images result in many inaccurate labels in cloud and snow detection datasets, introducing unnecessary noises into the training and testing processes. By constructing a new dataset and proposing a novel training strategy with the curriculum learning paradigm, we guide the model in reducing overfitting to noisy labels. Additionally, we design a more appropriate model performance evaluation method, that alleviates the performance assessment bias caused by noisy labels. By conducting experiments on models with UNet and Segformer, we have validated the effectiveness of our proposed method. This paper is the first to consider the impact of label noise on the detection of clouds and snow in remote sensing images.
- [1001] arXiv:2401.08940 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: CEL: A Continual Learning Model for Disease Outbreak Prediction by Leveraging Domain Adaptation via Elastic Weight ConsolidationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Continual learning, the ability of a model to learn over time without forgetting previous knowledge and, therefore, be adaptive to new data, is paramount in dynamic fields such as disease outbreak prediction. Deep neural networks, i.e., LSTM, are prone to error due to catastrophic forgetting. This study introduces a novel CEL model for continual learning by leveraging domain adaptation via Elastic Weight Consolidation (EWC). This model aims to mitigate the catastrophic forgetting phenomenon in a domain incremental setting. The Fisher Information Matrix (FIM) is constructed with EWC to develop a regularization term that penalizes changes to important parameters, namely, the important previous knowledge. CEL's performance is evaluated on three distinct diseases, Influenza, Mpox, and Measles, with different metrics. The high R-squared values during evaluation and reevaluation outperform the other state-of-the-art models in several contexts, indicating that CEL adapts to incremental data well. CEL's robustness and reliability are underscored by its minimal 65% forgetting rate and 18% higher memory stability compared to existing benchmark studies. This study highlights CEL's versatility in disease outbreak prediction, addressing evolving data with temporal patterns. It offers a valuable model for proactive disease control with accurate, timely predictions.
- [1002] arXiv:2401.08957 (cross-list from cs.RO) [ pdf , other ]
-
Title: SWBT: Similarity Weighted Behavior Transformer with the Imperfect Demonstration for Robotic ManipulationAuthors: Kun Wu , Ning Liu , Zhen Zhao , Di Qiu , Jinming Li , Zhengping Che , Zhiyuan Xu , Qinru Qiu , Jian TangComments: 8 pages, 5 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Imitation learning (IL), aiming to learn optimal control policies from expert demonstrations, has been an effective method for robot manipulation tasks. However, previous IL methods either only use expensive expert demonstrations and omit imperfect demonstrations or rely on interacting with the environment and learning from online experiences. In the context of robotic manipulation, we aim to conquer the above two challenges and propose a novel framework named Similarity Weighted Behavior Transformer (SWBT). SWBT effectively learn from both expert and imperfect demonstrations without interaction with environments. We reveal that the easy-to-get imperfect demonstrations, such as forward and inverse dynamics, significantly enhance the network by learning fruitful information. To the best of our knowledge, we are the first to attempt to integrate imperfect demonstrations into the offline imitation learning setting for robot manipulation tasks. Extensive experiments on the ManiSkill2 benchmark built on the high-fidelity Sapien simulator and real-world robotic manipulation tasks demonstrated that the proposed method can extract better features and improve the success rates for all tasks. Our code will be released upon acceptance of the paper.
- [1003] arXiv:2401.08959 (cross-list from cs.LG) [ pdf , other ]
-
Title: Towards Off-Policy Reinforcement Learning for Ranking Policies with Human FeedbackSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Probabilistic learning to rank (LTR) has been the dominating approach for optimizing the ranking metric, but cannot maximize long-term rewards. Reinforcement learning models have been proposed to maximize user long-term rewards by formulating the recommendation as a sequential decision-making problem, but could only achieve inferior accuracy compared to LTR counterparts, primarily due to the lack of online interactions and the characteristics of ranking. In this paper, we propose a new off-policy value ranking (VR) algorithm that can simultaneously maximize user long-term rewards and optimize the ranking metric offline for improved sample efficiency in a unified Expectation-Maximization (EM) framework. We theoretically and empirically show that the EM process guides the leaned policy to enjoy the benefit of integration of the future reward and ranking metric, and learn without any online interactions. Extensive offline and online experiments demonstrate the effectiveness of our methods.
- [1004] arXiv:2401.08960 (cross-list from cs.HC) [ pdf , other ]
-
Title: From User Surveys to Telemetry-Driven Agents: Exploring the Potential of Personalized Productivity SolutionsAuthors: Subigya Nepal , Javier Hernandez , Talie Massachi , Kael Rowan , Judith Amores , Jina Suh , Gonzalo Ramos , Brian Houck , Shamsi T. Iqbal , Mary CzerwinskiSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: We present a comprehensive, user-centric approach to understand preferences in AI-based productivity agents and develop personalized solutions tailored to users' needs. Utilizing a two-phase method, we first conducted a survey with 363 participants, exploring various aspects of productivity, communication style, agent approach, personality traits, personalization, and privacy. Drawing on the survey insights, we developed a GPT-4 powered personalized productivity agent that utilizes telemetry data gathered via Viva Insights from information workers to provide tailored assistance. We compared its performance with alternative productivity-assistive tools, such as dashboard and narrative, in a study involving 40 participants. Our findings highlight the importance of user-centric design, adaptability, and the balance between personalization and privacy in AI-assisted productivity tools. By building on the insights distilled from our study, we believe that our work can enable and guide future research to further enhance productivity solutions, ultimately leading to optimized efficiency and user experiences for information workers.
- [1005] arXiv:2401.08973 (cross-list from cs.CV) [ pdf , other ]
-
Title: OCTO+: A Suite for Automatic Open-Vocabulary Object Placement in Mixed RealityComments: 2024 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIXVR)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: One key challenge in Augmented Reality is the placement of virtual content in natural locations. Most existing automated techniques can only work with a closed-vocabulary, fixed set of objects. In this paper, we introduce and evaluate several methods for automatic object placement using recent advances in open-vocabulary vision-language models. Through a multifaceted evaluation, we identify a new state-of-the-art method, OCTO+. We also introduce a benchmark for automatically evaluating the placement of virtual objects in augmented reality, alleviating the need for costly user studies. Through this, in addition to human evaluations, we find that OCTO+ places objects in a valid region over 70% of the time, outperforming other methods on a range of metrics.
- [1006] arXiv:2401.08977 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: FedLoGe: Joint Local and Generic Federated Learning under Long-tailed DataAuthors: Zikai Xiao , Zihan Chen , Liyinglan Liu , Yang Feng , Jian Wu , Wanlu Liu , Joey Tianyi Zhou , Howard Hao Yang , Zuozhu LiuComments: Accepted by ICLR 2024, code: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Federated Long-Tailed Learning (Fed-LT), a paradigm wherein data collected from decentralized local clients manifests a globally prevalent long-tailed distribution, has garnered considerable attention in recent times. In the context of Fed-LT, existing works have predominantly centered on addressing the data imbalance issue to enhance the efficacy of the generic global model while neglecting the performance at the local level. In contrast, conventional Personalized Federated Learning (pFL) techniques are primarily devised to optimize personalized local models under the presumption of a balanced global data distribution. This paper introduces an approach termed Federated Local and Generic Model Training in Fed-LT (FedLoGe), which enhances both local and generic model performance through the integration of representation learning and classifier alignment within a neural collapse framework. Our investigation reveals the feasibility of employing a shared backbone as a foundational framework for capturing overarching global trends, while concurrently employing individualized classifiers to encapsulate distinct refinements stemming from each client's local features. Building upon this discovery, we establish the Static Sparse Equiangular Tight Frame Classifier (SSE-C), inspired by neural collapse principles that naturally prune extraneous noisy features and foster the acquisition of potent data representations. Furthermore, leveraging insights from imbalance neural collapse's classifier norm patterns, we develop Global and Local Adaptive Feature Realignment (GLA-FR) via an auxiliary global classifier and personalized Euclidean norm transfer to align global features with client preferences. Extensive experimental results on CIFAR-10/100-LT, ImageNet, and iNaturalist demonstrate the advantage of our method over state-of-the-art pFL and Fed-LT approaches.
- [1007] arXiv:2401.08984 (cross-list from cs.LG) [ pdf , other ]
-
Title: A GAN-based data poisoning framework against anomaly detection in vertical federated learningComments: 6 pages, 7 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: In vertical federated learning (VFL), commercial entities collaboratively train a model while preserving data privacy. However, a malicious participant's poisoning attack may degrade the performance of this collaborative model. The main challenge in achieving the poisoning attack is the absence of access to the server-side top model, leaving the malicious participant without a clear target model. To address this challenge, we introduce an innovative end-to-end poisoning framework P-GAN. Specifically, the malicious participant initially employs semi-supervised learning to train a surrogate target model. Subsequently, this participant employs a GAN-based method to produce adversarial perturbations to degrade the surrogate target model's performance. Finally, the generator is obtained and tailored for VFL poisoning. Besides, we develop an anomaly detection algorithm based on a deep auto-encoder (DAE), offering a robust defense mechanism to VFL scenarios. Through extensive experiments, we evaluate the efficacy of P-GAN and DAE, and further analyze the factors that influence their performance.
- [1008] arXiv:2401.08996 (cross-list from cs.LG) [ pdf , other ]
-
Title: MicroNAS: Zero-Shot Neural Architecture Search for MCUsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Neural Architecture Search (NAS) effectively discovers new Convolutional Neural Network (CNN) architectures, particularly for accuracy optimization. However, prior approaches often require resource-intensive training on super networks or extensive architecture evaluations, limiting practical applications. To address these challenges, we propose MicroNAS, a hardware-aware zero-shot NAS framework designed for microcontroller units (MCUs) in edge computing. MicroNAS considers target hardware optimality during the search, utilizing specialized performance indicators to identify optimal neural architectures without high computational costs. Compared to previous works, MicroNAS achieves up to 1104x improvement in search efficiency and discovers models with over 3.23x faster MCU inference while maintaining similar accuracy
- [1009] arXiv:2401.09003 (cross-list from cs.CL) [ pdf , other ]
-
Title: Augmenting Math Word Problems via Iterative Question ComposingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Despite the advancements in large language models (LLMs) for mathematical reasoning, solving competition-level math problems remains a significant challenge, especially for open-source LLMs without external tools. We introduce the MMIQC dataset, comprising a mixture of processed web data and synthetic question-response pairs, aimed at enhancing the mathematical reasoning capabilities of base language models. Models fine-tuned on MMIQC consistently surpass their counterparts in performance on the MATH benchmark across various model sizes. Notably, Qwen-72B-MMIQC achieves a 45.0% accuracy, exceeding the previous open-source state-of-the-art by 8.2% and outperforming the initial version GPT-4 released in 2023. Extensive evaluation results on Hungarian high school finals suggest that such improvement can generalize to unseen data. Our ablation study on MMIQC reveals that a large part of the improvement can be attributed to our novel augmentation method, Iterative Question Composing (IQC), which involves iteratively composing new questions from seed problems using an LLM and applying rejection sampling through another LLM. The MMIQC dataset is available on the HuggingFace hub at this https URL . Our code is available at this https URL .
- [1010] arXiv:2401.09008 (cross-list from cs.CV) [ pdf , other ]
-
Title: Hybrid of DiffStride and Spectral Pooling in Convolutional Neural NetworksAuthors: Sulthan Rafif , Mochamad Arfan Ravy Wahyu Pratama , Mohammad Faris Azhar , Ahmad Mustafidul Ibad , Lailil Muflikhah , Novanto YudistiraJournal-ref: CSIAM Transactions on Applied Mathematics; R. Riad et al, "Learning strides in convolutional neural networks," pp. 1-16, 2022. [Online];Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Stride determines the distance between adjacent filter positions as the filter moves across the input. A fixed stride causes important information contained in the image can not be captured, so that important information is not classified. Therefore, in previous research, the DiffStride Method was applied, namely the Strided Convolution Method with which it can learn its own stride value. Severe Quantization and a constraining lower bound on preserved information are arises with Max Pooling Downsampling Method. Spectral Pooling reduce the constraint lower bound on preserved information by cutting off the representation in the frequency domain. In this research a CNN Model is proposed with the Downsampling Learnable Stride Technique performed by Backpropagation combined with the Spectral Pooling Technique. Diffstride and Spectral Pooling techniques are expected to maintain most of the information contained in the image. In this study, we compare the Hybrid Method, which is a combined implementation of Spectral Pooling and DiffStride against the Baseline Method, which is the DiffStride implementation on ResNet 18. The accuracy result of the DiffStride combination with Spectral Pooling improves over DiffStride which is baseline method by 0.0094. This shows that the Hybrid Method can maintain most of the information by cutting of the representation in the frequency domain and determine the stride of the learning result through Backpropagation.
- [1011] arXiv:2401.09011 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Inductive Models for Artificial Intelligence Systems are Insufficient without Good ExplanationsAuthors: Udesh HabaraduwaSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper discusses the limitations of machine learning (ML), particularly deep artificial neural networks (ANNs), which are effective at approximating complex functions but often lack transparency and explanatory power. It highlights the `problem of induction' : the philosophical issue that past observations may not necessarily predict future events, a challenge that ML models face when encountering new, unseen data. The paper argues for the importance of not just making predictions but also providing good explanations, a feature that current models often fail to deliver. It suggests that for AI to progress, we must seek models that offer insights and explanations, not just predictions.
- [1012] arXiv:2401.09029 (cross-list from cs.CV) [ pdf , other ]
-
Title: Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor GradingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly. Accurate identification of the type and grade of tumor in the early stages plays an important role in choosing a precise treatment plan. The Magnetic Resonance Imaging (MRI) protocols of different sequences provide clinicians with important contradictory information to identify tumor regions. However, manual assessment is time-consuming and error-prone due to big amount of data and the diversity of brain tumor types. Hence, there is an unmet need for MRI automated brain tumor diagnosis. We observe that the predictive capability of uni-modality models is limited and their performance varies widely across modalities, and the commonly used modality fusion methods would introduce potential noise, which results in significant performance degradation. To overcome these challenges, we propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading. To balance the tradeoff between model efficiency and efficacy, we employ ResNet Mix Convolution as the backbone network for feature extraction. Besides, dual attention is applied to capture the semantic interdependencies in spatial and slice dimensions respectively. To facilitate information interaction among modalities, we design a cross-modality guidance-aided module where the primary modality guides the other secondary modalities during the process of training, which can effectively leverage the complementary information of different MRI modalities and meanwhile alleviate the impact of the possible noise.
- [1013] arXiv:2401.09034 (cross-list from cs.IR) [ pdf , other ]
-
Title: UOEP: User-Oriented Exploration Policy for Enhancing Long-Term User Experiences in Recommender SystemsSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Reinforcement learning (RL) has gained traction for enhancing user long-term experiences in recommender systems by effectively exploring users' interests. However, modern recommender systems exhibit distinct user behavioral patterns among tens of millions of items, which increases the difficulty of exploration. For example, user behaviors with different activity levels require varying intensity of exploration, while previous studies often overlook this aspect and apply a uniform exploration strategy to all users, which ultimately hurts user experiences in the long run. To address these challenges, we propose User-Oriented Exploration Policy (UOEP), a novel approach facilitating fine-grained exploration among user groups. We first construct a distributional critic which allows policy optimization under varying quantile levels of cumulative reward feedbacks from users, representing user groups with varying activity levels. Guided by this critic, we devise a population of distinct actors aimed at effective and fine-grained exploration within its respective user group. To simultaneously enhance diversity and stability during the exploration process, we further introduce a population-level diversity regularization term and a supervision module. Experimental results on public recommendation datasets demonstrate that our approach outperforms all other baselines in terms of long-term performance, validating its user-oriented exploration effectiveness. Meanwhile, further analyses reveal our approach's benefits of improved performance for low-activity users as well as increased fairness among users.
- [1014] arXiv:2401.09067 (cross-list from cs.LG) [ pdf , other ]
-
Title: Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular EmbeddingComments: Accepted to AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Deep neural networks are susceptible to catastrophic forgetting when trained on sequential tasks. Various continual learning (CL) methods often rely on exemplar buffers or/and network expansion for balancing model stability and plasticity, which, however, compromises their practical value due to privacy and memory concerns. Instead, this paper considers a strict yet realistic setting, where the training data from previous tasks is unavailable and the model size remains relatively constant during sequential training. To achieve such desiderata, we propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. This is achieved by the synergy between two key components: HSIC-Bottleneck Orthogonalization (HBO) implements non-overwritten parameter updates mediated by Hilbert-Schmidt independence criterion in an orthogonal space and EquiAngular Embedding (EAE) enhances decision boundary adaptation between old and new tasks with predefined basis vectors. Extensive experiments demonstrate that our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
- [1015] arXiv:2401.09068 (cross-list from cs.LG) [ pdf , other ]
-
Title: DTMM: Deploying TinyML Models on Extremely Weak IoT Devices with PruningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: DTMM is a library designed for efficient deployment and execution of machine learning models on weak IoT devices such as microcontroller units (MCUs). The motivation for designing DTMM comes from the emerging field of tiny machine learning (TinyML), which explores extending the reach of machine learning to many low-end IoT devices to achieve ubiquitous intelligence. Due to the weak capability of embedded devices, it is necessary to compress models by pruning enough weights before deploying. Although pruning has been studied extensively on many computing platforms, two key issues with pruning methods are exacerbated on MCUs: models need to be deeply compressed without significantly compromising accuracy, and they should perform efficiently after pruning. Current solutions only achieve one of these objectives, but not both. In this paper, we find that pruned models have great potential for efficient deployment and execution on MCUs. Therefore, we propose DTMM with pruning unit selection, pre-execution pruning optimizations, runtime acceleration, and post-execution low-cost storage to fill the gap for efficient deployment and execution of pruned models. It can be integrated into commercial ML frameworks for practical deployment, and a prototype system has been developed. Extensive experiments on various models show promising gains compared to state-of-the-art methods.
- [1016] arXiv:2401.09071 (cross-list from cs.LG) [ pdf , other ]
-
Title: Rethinking Spectral Graph Neural Networks with Spatially Adaptive FilteringSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Whilst spectral Graph Neural Networks (GNNs) are theoretically well-founded in the spectral domain, their practical reliance on polynomial approximation implies a profound linkage to the spatial domain. As previous studies rarely examine spectral GNNs from the spatial perspective, their spatial-domain interpretability remains elusive, e.g., what information is essentially encoded by spectral GNNs in the spatial domain? In this paper, to answer this question, we establish a theoretical connection between spectral filtering and spatial aggregation, unveiling an intrinsic interaction that spectral filtering implicitly leads the original graph to an adapted new graph, explicitly computed for spatial aggregation. Both theoretical and empirical investigations reveal that the adapted new graph not only exhibits non-locality but also accommodates signed edge weights to reflect label consistency among nodes. These findings thus highlight the interpretable role of spectral GNNs in the spatial domain and inspire us to rethink graph spectral filters beyond the fixed-order polynomials, which neglect global information. Built upon the theoretical findings, we revisit the state-of-the-art spectral GNNs and propose a novel Spatially Adaptive Filtering (SAF) framework, which leverages the adapted new graph by spectral filtering for an auxiliary non-local aggregation. Notably, our proposed SAF comprehensively models both node similarity and dissimilarity from a global perspective, therefore alleviating persistent deficiencies of GNNs related to long-range dependencies and graph heterophily. Extensive experiments over 13 node classification benchmarks demonstrate the superiority of our proposed framework to the state-of-the-art models.
- [1017] arXiv:2401.09073 (cross-list from cs.LG) [ pdf , other ]
-
Title: Fixed-Budget Differentially Private Best Arm IdentificationComments: Accepted to ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Abstract: We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints, when the arm rewards are supported on the unit interval. Given a finite budget $T$ and a privacy parameter $\varepsilon>0$, the goal is to minimise the error probability in finding the arm with the largest mean after $T$ sampling rounds, subject to the constraint that the policy of the decision maker satisfies a certain {\em $\varepsilon$-differential privacy} ($\varepsilon$-DP) constraint. We construct a policy satisfying the $\varepsilon$-DP constraint (called {\sc DP-BAI}) by proposing the principle of {\em maximum absolute determinants}, and derive an upper bound on its error probability. Furthermore, we derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$, with exponents in the two bounds matching order-wise in (a) the sub-optimality gaps of the arms, (b) $\varepsilon$, and (c) the problem complexity that is expressible as the sum of two terms, one characterising the complexity of standard fixed-budget BAI (without privacy constraints), and the other accounting for the $\varepsilon$-DP constraint. Additionally, we present some auxiliary results that contribute to the derivation of the lower bound on the error probability. These results, we posit, may be of independent interest and could prove instrumental in proving lower bounds on error probabilities in several other bandit problems. Whereas prior works provide results for BAI in the fixed-budget regime without privacy constraints or in the fixed-confidence regime with privacy constraints, our work fills the gap in the literature by providing the results for BAI in the fixed-budget regime under the $\varepsilon$-DP constraint.
- [1018] arXiv:2401.09074 (cross-list from cs.LG) [ pdf , other ]
-
Title: Code Simulation Challenges for Large Language ModelsAuthors: Emanuele La Malfa , Christoph Weinhuber , Orazio Torre , Fangru Lin , Anthony Cohn , Nigel Shadbolt , Michael WooldridgeComments: main paper (10 pages) + Appendix (11 pages)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Abstract: We investigate the extent to which Large Language Models (LLMs) can simulate the execution of computer code and algorithms. We begin by looking at straight line programs, and show that current LLMs demonstrate poor performance even with such simple programs -- performance rapidly degrades with the length of code. We then investigate the ability of LLMs to simulate programs that contain critical paths and redundant instructions. We also go beyond straight line program simulation with sorting algorithms and nested loops, and we show the computational complexity of a routine directly affects the ability of an LLM to simulate its execution. We observe that LLMs execute instructions sequentially and with a low error margin only for short programs or standard procedures. LLMs' code simulation is in tension with their pattern recognition and memorisation capabilities: on tasks where memorisation is detrimental, we propose a novel prompting method to simulate code execution line by line. Empirically, our new Chain of Simulation (CoSm) method improves on the standard Chain of Thought prompting approach by avoiding the pitfalls of memorisation.
- [1019] arXiv:2401.09075 (cross-list from cs.CR) [ pdf , other ]
-
Title: GPT in Sheep's Clothing: The Risk of Customized GPTsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: In November 2023, OpenAI introduced a new service allowing users to create custom versions of ChatGPT (GPTs) by using specific instructions and knowledge to guide the model's behavior. We aim to raise awareness of the fact that GPTs can be used maliciously, posing privacy and security risks to their users.
- [1020] arXiv:2401.09082 (cross-list from cs.CL) [ pdf , other ]
-
Title: What makes for a 'good' social actor? Using respect as a lens to evaluate interactions with language agentsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: With the growing popularity of dialogue agents based on large language models (LLMs), urgent attention has been drawn to finding ways to ensure their behaviour is ethical and appropriate. These are largely interpreted in terms of the 'HHH' criteria: making outputs more helpful and honest, and avoiding harmful (biased, toxic, or inaccurate) statements. Whilst this semantic focus is useful from the perspective of viewing LLM agents as mere mediums for information, it fails to account for pragmatic factors that can make the same utterance seem more or less offensive or tactless in different social situations. We propose an approach to ethics that is more centred on relational and situational factors, exploring what it means for a system, as a social actor, to treat an individual respectfully in a (series of) interaction(s). Our work anticipates a set of largely unexplored risks at the level of situated interaction, and offers practical suggestions to help LLM technologies behave as 'good' social actors and treat people respectfully.
- [1021] arXiv:2401.09192 (cross-list from cs.LG) [ pdf , other ]
-
Title: Preparing Lessons for Progressive Training on Language ModelsAuthors: Yu Pan , Ye Yuan , Yichun Yin , Jiaxin Shi , Zenglin Xu , Ming Zhang , Lifeng Shang , Xin Jiang , Qun LiuSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions due to growing model sizes. Prior work suggests using pretrained small models to improve training efficiency, but this approach may not be suitable for new model structures. On the other hand, training from scratch can be slow, and progressively stacking layers often fails to achieve significant acceleration. To address these challenges, we propose a novel method called Apollo, which prep\textbf{a}res lessons for ex\textbf{p}anding \textbf{o}perations by \textbf{l}earning high-\textbf{l}ayer functi\textbf{o}nality during training of low layers. Our approach involves low-value-prioritized sampling (LVPS) to train different depths and weight sharing to facilitate efficient expansion. We also introduce an interpolation method for stable model depth extension. Experiments demonstrate that Apollo achieves state-of-the-art acceleration ratios, even rivaling methods using pretrained models, making it a universal and efficient solution for training deep models while reducing time, financial, and environmental costs.
- [1022] arXiv:2401.09235 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Characterization Theorem for Equivariant Networks with Point-wise ActivationsComments: Accepted at the 12th International Conference on Learning Representations (ICLR 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Equivariant neural networks have shown improved performance, expressiveness and sample complexity on symmetrical domains. But for some specific symmetries, representations, and choice of coordinates, the most common point-wise activations, such as ReLU, are not equivariant, hence they cannot be employed in the design of equivariant neural networks. The theorem we present in this paper describes all possible combinations of finite-dimensional representations, choice of coordinates and point-wise activations to obtain an exactly equivariant layer, generalizing and strengthening existing characterizations. Notable cases of practical relevance are discussed as corollaries. Indeed, we prove that rotation-equivariant networks can only be invariant, as it happens for any network which is equivariant with respect to connected compact groups. Then, we discuss implications of our findings when applied to important instances of exactly equivariant networks. First, we completely characterize permutation equivariant networks such as Invariant Graph Networks with point-wise nonlinearities and their geometric counterparts, highlighting a plethora of models whose expressive power and performance are still unknown. Second, we show that feature spaces of disentangled steerable convolutional neural networks are trivial representations.
- [1023] arXiv:2401.09239 (cross-list from cs.CV) [ pdf , other ]
-
Title: DaFoEs: Mixing Datasets towards the generalization of vision-state deep-learning Force Estimation in Minimally Invasive Robotic SurgerySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Precisely determining the contact force during safe interaction in Minimally Invasive Robotic Surgery (MIRS) is still an open research challenge. Inspired by post-operative qualitative analysis from surgical videos, the use of cross-modality data driven deep neural network models has been one of the newest approaches to predict sensorless force trends. However, these methods required for large and variable datasets which are not currently available. In this paper, we present a new vision-haptic dataset (DaFoEs) with variable soft environments for the training of deep neural models. In order to reduce the bias from a single dataset, we present a pipeline to generalize different vision and state data inputs for mixed dataset training, using a previously validated dataset with different setup. Finally, we present a variable encoder-decoder architecture to predict the forces done by the laparoscopic tool using single input or sequence of inputs. For input sequence, we use a recurrent decoder, named with the prefix R, and a new temporal sampling to represent the acceleration of the tool. During our training, we demonstrate that single dataset training tends to overfit to the training data domain, but has difficulties on translating the results across new domains. However, dataset mixing presents a good translation with a mean relative estimated force error of 5% and 12% for the recurrent and non-recurrent models respectively. Our method, also marginally increase the effectiveness of transformers for force estimation up to a maximum of ~15%, as the volume of available data is increase by 150%. In conclusion, we demonstrate that mixing experimental set ups for vision-state force estimation in MIRS is a possible approach towards the general solution of the problem.
- [1024] arXiv:2401.09240 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: A Blockchain-based Model for Securing Data Pipeline in a Heterogeneous Information SystemComments: 13 pagesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
Abstract: In our digital world, access to personal and public data has become an item of concern, with challenging security and privacy aspects. Modern information systems are heterogeneous in nature and have an inherent security vulnerability, which is susceptible to data interception and data modification due to unsecured communication data pipelines between connected endpoints. This re-search article presents a blockchain-based model for securing data pipelines in a heterogeneous information system using an integrated multi-hazard early warning system (MHEWS) as a case study. The proposed model utilizes the inherent security features of blockchain technology to address the security and privacy concerns that arise in data pipelines. The model is designed to ensure data integrity, confidentiality, and authenticity in a decentralized manner. The model is evaluated in a hybrid environment using a prototype implementation and simulation experiments with outcomes that demonstrate advantages over traditional approaches for a tamper-proof and immutable data pipeline for data authenticity and integrity using a confidential ledger.
- [1025] arXiv:2401.09243 (cross-list from cs.RO) [ pdf , other ]
-
Title: DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy LearningAuthors: Sabariswaran Mani , Abhranil Chandra , Sreyas Venkataraman , Adyan Rizvi , Yash Sirvi , Soumojit Bhattacharya , Aritra HazraComments: NeurIPS 2023 Train Offline Test Online Workshop and CompetitionSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Robot learning tasks are extremely compute-intensive and hardware-specific. Thus the avenues of tackling these challenges, using a diverse dataset of offline demonstrations that can be used to train robot manipulation agents, is very appealing. The Train-Offline-Test-Online (TOTO) Benchmark provides a well-curated open-source dataset for offline training comprised mostly of expert data and also benchmark scores of the common offline-RL and behaviour cloning agents. In this paper, we introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning, and measured the efficacy of our method on real online physical robots at test time. This is also our official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023. We experimented with both pre-trained visual representation and agent policies. In our experiments, we find that MOCO finetuned ResNet50 performs the best in comparison to other finetuned representations. Goal state conditioning and mapping to transitions resulted in a minute increase in the success rate and mean-reward. As for the agent policy, we developed DiffClone, a behaviour cloning agent improved using conditional diffusion.
- [1026] arXiv:2401.09252 (cross-list from cs.CV) [ pdf , other ]
-
Title: 3D Scene Geometry Estimation from 360$^\circ$ Imagery: A SurveyAuthors: Thiago Lopes Trugillo da Silveira , Paulo Gamarra Lessa Pinto , Jeffri Erwin Murrugarra Llerena , Claudio Rosito JungComments: Published in ACM Computing SurveysJournal-ref: ACM Comput. Surv. 55, 4, Article 68, 2023Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Abstract: This paper provides a comprehensive survey on pioneer and state-of-the-art 3D scene geometry estimation methodologies based on single, two, or multiple images captured under the omnidirectional optics. We first revisit the basic concepts of the spherical camera model, and review the most common acquisition technologies and representation formats suitable for omnidirectional (also called 360$^\circ$, spherical or panoramic) images and videos. We then survey monocular layout and depth inference approaches, highlighting the recent advances in learning-based solutions suited for spherical data. The classical stereo matching is then revised on the spherical domain, where methodologies for detecting and describing sparse and dense features become crucial. The stereo matching concepts are then extrapolated for multiple view camera setups, categorizing them among light fields, multi-view stereo, and structure from motion (or visual simultaneous localization and mapping). We also compile and discuss commonly adopted datasets and figures of merit indicated for each purpose and list recent results for completeness. We conclude this paper by pointing out current and future trends.
- [1027] arXiv:2401.09286 (cross-list from cs.RO) [ pdf , other ]
-
Title: Deployable Reinforcement Learning with Variable Control RateComments: Paper for AAAI-DAI 2024 workshopSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Deploying controllers trained with Reinforcement Learning (RL) on real robots can be challenging: RL relies on agents' policies being modeled as Markov Decision Processes (MDPs), which assume an inherently discrete passage of time. The use of MDPs results in that nearly all RL-based control systems employ a fixed-rate control strategy with a period (or time step) typically chosen based on the developer's experience or specific characteristics of the application environment. Unfortunately, the system should be controlled at the highest, worst-case frequency to ensure stability, which can demand significant computational and energy resources and hinder the deployability of the controller on onboard hardware. Adhering to the principles of reactive programming, we surmise that applying control actions only when necessary enables the use of simpler hardware and helps reduce energy consumption. We challenge the fixed frequency assumption by proposing a variant of RL with variable control rate. In this approach, the policy decides the action the agent should take as well as the duration of the time step associated with that action. In our new setting, we expand Soft Actor-Critic (SAC) to compute the optimal policy with a variable control rate, introducing the Soft Elastic Actor-Critic (SEAC) algorithm. We show the efficacy of SEAC through a proof-of-concept simulation driving an agent with Newtonian kinematics. Our experiments show higher average returns, shorter task completion times, and reduced computational resources when compared to fixed rate policies.
- [1028] arXiv:2401.09294 (cross-list from cs.SD) [ pdf , other ]
-
Title: T-FOLEY: A Controllable Waveform-Domain Diffusion Model for Temporal-Event-Guided Foley Sound SynthesisSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Abstract: Foley sound, audio content inserted synchronously with videos, plays a critical role in the user experience of multimedia content. Recently, there has been active research in Foley sound synthesis, leveraging the advancements in deep generative models. However, such works mainly focus on replicating a single sound class or a textual sound description, neglecting temporal information, which is crucial in the practical applications of Foley sound. We present T-Foley, a Temporal-event-guided waveform generation model for Foley sound synthesis. T-Foley generates high-quality audio using two conditions: the sound class and temporal event feature. For temporal conditioning, we devise a temporal event feature and a novel conditioning technique named Block-FiLM. T-Foley achieves superior performance in both objective and subjective evaluation metrics and generates Foley sound well-synchronized with the temporal events. Additionally, we showcase T-Foley's practical applications, particularly in scenarios involving vocal mimicry for temporal event control. We show the demo on our companion website.
- [1029] arXiv:2401.09322 (cross-list from cs.RO) [ pdf , other ]
-
Title: FIT-SLAM -- Fisher Information and Traversability estimation-based Active SLAM for exploration in 3D environmentsComments: 6 pages, 6 figures, IEEE ICARA 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Active visual SLAM finds a wide array of applications in GNSS-Denied sub-terrain environments and outdoor environments for ground robots. To achieve robust localization and mapping accuracy, it is imperative to incorporate the perception considerations in the goal selection and path planning towards the goal during an exploration mission. Through this work, we propose FIT-SLAM (Fisher Information and Traversability estimation-based Active SLAM), a new exploration method tailored for unmanned ground vehicles (UGVs) to explore 3D environments. This approach is devised with the dual objectives of sustaining an efficient exploration rate while optimizing SLAM accuracy. Initially, an estimation of a global traversability map is conducted, which accounts for the environmental constraints pertaining to traversability. Subsequently, we propose a goal candidate selection approach along with a path planning method towards this goal that takes into account the information provided by the landmarks used by the SLAM backend to achieve robust localization and successful path execution . The entire algorithm is tested and evaluated first in a simulated 3D world, followed by a real-world environment and is compared to pre-existing exploration methods. The results obtained during this evaluation demonstrate a significant increase in the exploration rate while effectively minimizing the localization covariance.
- [1030] arXiv:2401.09334 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large Language Models Are Neurosymbolic ReasonersAuthors: Meng Fang , Shilong Deng , Yudi Zhang , Zijing Shi , Ling Chen , Mykola Pechenizkiy , Jun WangComments: Accepted by AAAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: A wide range of real-world applications is characterized by their symbolic nature, necessitating a strong capability for symbolic reasoning. This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners. We focus on text-based games, significant benchmarks for agents with natural language capabilities, particularly in symbolic tasks like math, map reading, sorting, and applying common sense in text-based worlds. To facilitate these agents, we propose an LLM agent designed to tackle symbolic challenges and achieve in-game objectives. We begin by initializing the LLM agent and informing it of its role. The agent then receives observations and a set of valid actions from the text-based games, along with a specific symbolic module. With these inputs, the LLM agent chooses an action and interacts with the game environments. Our experimental results demonstrate that our method significantly enhances the capability of LLMs as automated agents for symbolic reasoning, and our LLM agent is effective in text-based games involving symbolic tasks, achieving an average performance of 88% across all tasks.
- [1031] arXiv:2401.09340 (cross-list from cs.CV) [ pdf , other ]
-
Title: SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene UnderstandingAuthors: Baoxiong Jia , Yixin Chen , Huangyue Yu , Yan Wang , Xuesong Niu , Tengyu Liu , Qing Li , Siyuan HuangComments: 21 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: 3D vision-language grounding, which focuses on aligning language with the 3D physical environment, stands as a cornerstone in the development of embodied agents. In comparison to recent advancements in the 2D domain, grounding language in 3D scenes faces several significant challenges: (i) the inherent complexity of 3D scenes due to the diverse object configurations, their rich attributes, and intricate relationships; (ii) the scarcity of paired 3D vision-language data to support grounded learning; and (iii) the absence of a unified learning framework to distill knowledge from grounded 3D data. In this work, we aim to address these three major challenges in 3D vision-language by examining the potential of systematically upscaling 3D vision-language learning in indoor environments. We introduce the first million-scale 3D vision-language dataset, SceneVerse, encompassing about 68K 3D indoor scenes and comprising 2.5M vision-language pairs derived from both human annotations and our scalable scene-graph-based generation approach. We demonstrate that this scaling allows for a unified pre-training framework, Grounded Pre-training for Scenes (GPS), for 3D vision-language learning. Through extensive experiments, we showcase the effectiveness of GPS by achieving state-of-the-art performance on all existing 3D visual grounding benchmarks. The vast potential of SceneVerse and GPS is unveiled through zero-shot transfer experiments in the challenging 3D vision-language tasks. Project website: this https URL .
- [1032] arXiv:2401.09352 (cross-list from cs.RO) [ pdf , other ]
-
Title: Neural Contractive Dynamical SystemsAuthors: Hadi Beik-Mohammadi , Søren Hauberg , Georgios Arvanitidis , Nadia Figueroa , Gerhard Neumann , Leonel RozoSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Stability guarantees are crucial when ensuring a fully autonomous robot does not take undesirable or potentially harmful actions. Unfortunately, global stability guarantees are hard to provide in dynamical systems learned from data, especially when the learned dynamics are governed by neural networks. We propose a novel methodology to learn neural contractive dynamical systems, where our neural architecture ensures contraction, and hence, global stability. To efficiently scale the method to high-dimensional dynamical systems, we develop a variant of the variational autoencoder that learns dynamics in a low-dimensional latent representation space while retaining contractive stability after decoding. We further extend our approach to learning contractive systems on the Lie group of rotations to account for full-pose end-effector dynamic motions. The result is the first highly flexible learning architecture that provides contractive stability guarantees with capability to perform obstacle avoidance. Empirically, we demonstrate that our approach encodes the desired dynamics more accurately than the current state-of-the-art, which provides less strong stability guarantees.
- [1033] arXiv:2401.09410 (cross-list from cs.CY) [ pdf , other ]
-
Title: Through the Looking-Glass: Transparency Implications and Challenges in Enterprise AI Knowledge SystemsSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Knowledge can't be disentangled from people. As AI knowledge systems mine vast volumes of work-related data, the knowledge that's being extracted and surfaced is intrinsically linked to the people who create and use it. When these systems get embedded in organizational settings, the information that is brought to the foreground and the information that's pushed to the periphery can influence how individuals see each other and how they see themselves at work. In this paper, we present the looking-glass metaphor and use it to conceptualize AI knowledge systems as systems that reflect and distort, expanding our view on transparency requirements, implications and challenges. We formulate transparency as a key mediator in shaping different ways of seeing, including seeing into the system, which unveils its capabilities, limitations and behavior, and seeing through the system, which shapes workers' perceptions of their own contributions and others within the organization. Recognizing the sociotechnical nature of these systems, we identify three transparency dimensions necessary to realize the value of AI knowledge systems, namely system transparency, procedural transparency and transparency of outcomes. We discuss key challenges hindering the implementation of these forms of transparency, bringing to light the wider sociotechnical gap and highlighting directions for future Computer-supported Cooperative Work (CSCW) research.
- [1034] arXiv:2401.09414 (cross-list from cs.CV) [ pdf , other ]
-
Title: Vlogger: Make Your Dream A VlogAuthors: Shaobin Zhuang , Kunchang Li , Xinyuan Chen , Yaohui Wang , Ziwei Liu , Yu Qiao , Yali WangComments: 16 pages, 8 figures, 11 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM)
Abstract: In this work, we present Vlogger, a generic AI system for generating a minute-level video blog (i.e., vlog) of user descriptions. Different from short videos with a few seconds, vlog often contains a complex storyline with diversified scenes, which is challenging for most existing video generation approaches. To break through this bottleneck, our Vlogger smartly leverages Large Language Model (LLM) as Director and decomposes a long video generation task of vlog into four key stages, where we invoke various foundation models to play the critical roles of vlog professionals, including (1) Script, (2) Actor, (3) ShowMaker, and (4) Voicer. With such a design of mimicking human beings, our Vlogger can generate vlogs through explainable cooperation of top-down planning and bottom-up shooting. Moreover, we introduce a novel video diffusion model, ShowMaker, which serves as a videographer in our Vlogger for generating the video snippet of each shooting scene. By incorporating Script and Actor attentively as textual and visual prompts, it can effectively enhance spatial-temporal coherence in the snippet. Besides, we design a concise mixed training paradigm for ShowMaker, boosting its capacity for both T2V generation and prediction. Finally, the extensive experiments show that our method achieves state-of-the-art performance on zero-shot T2V generation and prediction tasks. More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor. The code and model is all available at this https URL .
- [1035] arXiv:2401.09432 (cross-list from cs.CL) [ pdf , other ]
-
Title: RoleCraft-GLM: Advancing Personalized Role-Playing in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This study presents RoleCraft-GLM, an innovative framework aimed at enhancing personalized role-playing with Large Language Models (LLMs). RoleCraft-GLM addresses the key issue of lacking personalized interactions in conversational AI, and offers a solution with detailed and emotionally nuanced character portrayals. We contribute a unique conversational dataset that shifts from conventional celebrity-centric characters to diverse, non-celebrity personas, thus enhancing the realism and complexity of language modeling interactions. Additionally, our approach includes meticulous character development, ensuring dialogues are both realistic and emotionally resonant. The effectiveness of RoleCraft-GLM is validated through various case studies, highlighting its versatility and skill in different scenarios. Our framework excels in generating dialogues that accurately reflect characters' personality traits and emotions, thereby boosting user engagement. In conclusion, RoleCraft-GLM marks a significant leap in personalized AI interactions, and paves the way for more authentic and immersive AI-assisted role-playing experiences by enabling more nuanced and emotionally rich dialogues
- [1036] arXiv:2401.09442 (cross-list from cs.CV) [ pdf , other ]
-
Title: Object Attribute Matters in Visual Question AnsweringComments: AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Visual question answering is a multimodal task that requires the joint comprehension of visual and textual information. However, integrating visual and textual semantics solely through attention layers is insufficient to comprehensively understand and align information from both modalities. Intuitively, object attributes can naturally serve as a bridge to unify them, which has been overlooked in previous research. In this paper, we propose a novel VQA approach from the perspective of utilizing object attribute, aiming to achieve better object-level visual-language alignment and multimodal scene understanding. Specifically, we design an attribute fusion module and a contrastive knowledge distillation module. The attribute fusion module constructs a multimodal graph neural network to fuse attributes and visual features through message passing. The enhanced object-level visual features contribute to solving fine-grained problem like counting-question. The better object-level visual-language alignment aids in understanding multimodal scenes, thereby improving the model's robustness. Furthermore, to augment scene understanding and the out-of-distribution performance, the contrastive knowledge distillation module introduces a series of implicit knowledge. We distill knowledge into attributes through contrastive loss, which further strengthens the representation learning of attribute features and facilitates visual-linguistic alignment. Intensive experiments on six datasets, COCO-QA, VQAv2, VQA-CPv2, VQA-CPv1, VQAvs and TDIUC, show the superiority of the proposed method.
- [1037] arXiv:2401.09443 (cross-list from cs.CV) [ pdf , other ]
-
Title: CRD: Collaborative Representation Distance for Practical Anomaly DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Visual defect detection plays an important role in intelligent industry. Patch based methods consider visual images as a collection of image patches according to positions, which have stronger discriminative ability for small defects in products, e.g. scratches on pills. However, the nearest neighbor search for the query image and the stored patches will occupy $O(n)$ complexity in terms of time and space requirements, posing strict challenges for deployment in edge environments. In this paper, we propose an alternative approach to the distance calculation of image patches via collaborative representation models. Starting from the nearest neighbor distance with $L_0$ constraint, we relax the constraint to $L_2$ constraint and solve the distance quickly in close-formed without actually accessing the original stored collection of image patches. Furthermore, we point out that the main computational burden of this close-formed solution can be pre-computed by high-performance server before deployment. Consequently, the distance calculation on edge devices only requires a simple matrix multiplication, which is extremely lightweight and GPU-friendly. Performance on real industrial scenarios demonstrates that compared to the existing state-of-the-art methods, this distance achieves several hundred times improvement in computational efficiency with slight performance drop, while greatly reducing memory overhead.
- [1038] arXiv:2401.09446 (cross-list from cs.CV) [ pdf , other ]
-
Title: Explainable Multimodal Sentiment Analysis on Bengali MemesAuthors: Kazi Toufique Elahi , Tasnuva Binte Rahman , Shakil Shahriar , Samir Sarker , Sajib Kumar Saha Joy , Faisal Muhammad ShahSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Memes have become a distinctive and effective form of communication in the digital era, attracting online communities and cutting across cultural barriers. Even though memes are frequently linked with humor, they have an amazing capacity to convey a wide range of emotions, including happiness, sarcasm, frustration, and more. Understanding and interpreting the sentiment underlying memes has become crucial in the age of information. Previous research has explored text-based, image-based, and multimodal approaches, leading to the development of models like CAPSAN and PromptHate for detecting various meme categories. However, the study of low-resource languages like Bengali memes remains scarce, with limited availability of publicly accessible datasets. A recent contribution includes the introduction of the MemoSen dataset. However, the achieved accuracy is notably low, and the dataset suffers from imbalanced distribution. In this study, we employed a multimodal approach using ResNet50 and BanglishBERT and achieved a satisfactory result of 0.71 weighted F1-score, performed comparison with unimodal approaches, and interpreted behaviors of the models using explainable artificial intelligence (XAI) techniques.
- [1039] arXiv:2401.09450 (cross-list from cs.CY) [ pdf , other ]
-
Title: Joining Forces for Pathology Diagnostics with AI Assistance: The EMPAIA InitiativeAuthors: Norman Zerbe , Lars Ole Schwen , Christian Geißler , Katja Wiesemann , Tom Bisson , Peter Boor , Rita Carvalho , Michael Franz , Christoph Jansen , Tim-Rasmus Kiehl , Björn Lindequist , Nora Charlotte Pohlan , Sarah Schmell , Klaus Strohmenger , Falk Zakrzewski , Markus Plass , Michael Takla , Tobias Küster , André Homeyer , Peter HufnaglSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
Abstract: Over the past decade, artificial intelligence (AI) methods in pathology have advanced substantially. However, integration into routine clinical practice has been slow due to numerous challenges, including technical and regulatory hurdles in translating research results into clinical diagnostic products and the lack of standardized interfaces. The open and vendor-neutral EMPAIA initiative addresses these challenges. Here, we provide an overview of EMPAIA's achievements and lessons learned. EMPAIA integrates various stakeholders of the pathology AI ecosystem, i.e., pathologists, computer scientists, and industry. In close collaboration, we developed technical interoperability standards, recommendations for AI testing and product development, and explainability methods. We implemented the modular and open-source EMPAIA platform and successfully integrated 14 AI-based image analysis apps from 8 different vendors, demonstrating how different apps can use a single standardized interface. We prioritized requirements and evaluated the use of AI in real clinical settings with 14 different pathology laboratories in Europe and Asia. In addition to technical developments, we created a forum for all stakeholders to share information and experiences on digital pathology and AI. Commercial, clinical, and academic stakeholders can now adopt EMPAIA's common open-source interfaces, providing a unique opportunity for large-scale standardization and streamlining of processes. Further efforts are needed to effectively and broadly establish AI assistance in routine laboratory use. To this end, a sustainable infrastructure, the non-profit association EMPAIA International, has been established to continue standardization and support broad implementation and advocacy for an AI-assisted digital pathology future.
- [1040] arXiv:2401.09452 (cross-list from cs.LG) [ pdf , other ]
-
Title: Incorporating Riemannian Geometric Features for Learning Coefficient of Pressure Distributions on Airplane WingsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The aerodynamic coefficients of aircrafts are significantly impacted by its geometry, especially when the angle of attack (AoA) is large. In the field of aerodynamics, traditional polynomial-based parameterization uses as few parameters as possible to describe the geometry of an airfoil. However, because the 3D geometry of a wing is more complicated than the 2D airfoil, polynomial-based parameterizations have difficulty in accurately representing the entire shape of a wing in 3D space. Existing deep learning-based methods can extract massive latent neural representations for the shape of 2D airfoils or 2D slices of wings. Recent studies highlight that directly taking geometric features as inputs to the neural networks can improve the accuracy of predicted aerodynamic coefficients. Motivated by geometry theory, we propose to incorporate Riemannian geometric features for learning Coefficient of Pressure (CP) distributions on wing surfaces. Our method calculates geometric features (Riemannian metric, connection, and curvature) and further inputs the geometric features, coordinates and flight conditions into a deep learning model to predict the CP distribution. Experimental results show that our method, compared to state-of-the-art Deep Attention Network (DAN), reduces the predicted mean square error (MSE) of CP by an average of 8.41% for the DLR-F11 aircraft test set.
- [1041] arXiv:2401.09454 (cross-list from cs.CV) [ pdf , other ]
-
Title: Voila-A: Aligning Vision-Language Models with User's Gaze AttentionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: In recent years, the integration of vision and language understanding has led to significant advancements in artificial intelligence, particularly through Vision-Language Models (VLMs). However, existing VLMs face challenges in handling real-world applications with complex scenes and multiple objects, as well as aligning their focus with the diverse attention patterns of human users. In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications. First, we collect hundreds of minutes of gaze data to demonstrate that we can mimic human gaze modalities using localized narratives. We then design an automatic data annotation pipeline utilizing GPT-4 to generate the VOILA-COCO dataset. Additionally, we innovate the Voila Perceiver modules to integrate gaze information into VLMs while preserving their pretrained knowledge. We evaluate Voila-A using a hold-out validation set and a newly collected VOILA-GAZE Testset, which features real-life scenarios captured with a gaze-tracking device. Our experimental results demonstrate that Voila-A significantly outperforms several baseline models. By aligning model attention with human gaze patterns, Voila-A paves the way for more intuitive, user-centric VLMs and fosters engaging human-AI interaction across a wide range of applications.
- [1042] arXiv:2401.09455 (cross-list from cs.NI) [ pdf , other ]
-
Title: Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning ApproachSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to satisfy various constraints related to satellite service quality. To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements. Specifically, we formulate the problem of packet routing with constraints as a max-min problem using the Lagrange method. Then we propose a novel constrained Multi-Agent reinforcement learning (MARL) dynamic routing algorithm named CMADR, which efficiently balances objective improvement and constraint satisfaction during the updating of policy and Lagrange multipliers. Finally, we conduct extensive experiments and an ablation study using the OneWeb and Telesat mega-constellations. Results demonstrate that CMADR reduces the packet delay by a minimum of 21% and 15%, while meeting stringent energy consumption and packet loss rate constraints, outperforming several baseline algorithms.
- [1043] arXiv:2401.09459 (cross-list from cs.CY) [ pdf , other ]
-
Title: What's my role? Modelling responsibility for AI-based safety-critical systemsComments: 22 pages, 7 figures, 2 tablesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: AI-Based Safety-Critical Systems (AI-SCS) are being increasingly deployed in the real world. These can pose a risk of harm to people and the environment. Reducing that risk is an overarching priority during development and operation. As more AI-SCS become autonomous, a layer of risk management via human intervention has been removed. Following an accident it will be important to identify causal contributions and the different responsible actors behind those to learn from mistakes and prevent similar future events. Many authors have commented on the "responsibility gap" where it is difficult for developers and manufacturers to be held responsible for harmful behaviour of an AI-SCS. This is due to the complex development cycle for AI, uncertainty in AI performance, and dynamic operating environment. A human operator can become a "liability sink" absorbing blame for the consequences of AI-SCS outputs they weren't responsible for creating, and may not have understanding of.
This cross-disciplinary paper considers different senses of responsibility (role, moral, legal and causal), and how they apply in the context of AI-SCS safety. We use a core concept (Actor(A) is responsible for Occurrence(O)) to create role responsibility models, producing a practical method to capture responsibility relationships and provide clarity on the previously identified responsibility issues. Our paper demonstrates the approach with two examples: a retrospective analysis of the Tempe Arizona fatal collision involving an autonomous vehicle, and a safety focused predictive role-responsibility analysis for an AI-based diabetes co-morbidity predictor. In both examples our primary focus is on safety, aiming to reduce unfair or disproportionate blame being placed on operators or developers. We present a discussion and avenues for future research. - [1044] arXiv:2401.09467 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Offline Handwriting Signature Verification: A Transfer Learning and Feature Selection ApproachComments: 11 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Handwritten signature verification poses a formidable challenge in biometrics and document authenticity. The objective is to ascertain the authenticity of a provided handwritten signature, distinguishing between genuine and forged ones. This issue has many applications in sectors such as finance, legal documentation, and security. Currently, the field of computer vision and machine learning has made significant progress in the domain of handwritten signature verification. The outcomes, however, may be enhanced depending on the acquired findings, the structure of the datasets, and the used models. Four stages make up our suggested strategy. First, we collected a large dataset of 12600 images from 420 distinct individuals, and each individual has 30 signatures of a certain kind (All authors signatures are genuine). In the subsequent stage, the best features from each image were extracted using a deep learning model named MobileNetV2. During the feature selection step, three selectors neighborhood component analysis (NCA), Chi2, and mutual info (MI) were used to pull out 200, 300, 400, and 500 features, giving a total of 12 feature vectors. Finally, 12 results have been obtained by applying machine learning techniques such as SVM with kernels (rbf, poly, and linear), KNN, DT, Linear Discriminant Analysis, and Naive Bayes. Without employing feature selection techniques, our suggested offline signature verification achieved a classification accuracy of 91.3%, whereas using the NCA feature selection approach with just 300 features it achieved a classification accuracy of 97.7%. High classification accuracy was achieved using the designed and suggested model, which also has the benefit of being a self-organized framework. Consequently, using the optimum minimally chosen features, the proposed method could identify the best model performance and result validation prediction vectors.
- [1045] arXiv:2401.09473 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Business and ethical concerns in domestic Conversational Generative AI-empowered multi-robot systemsAuthors: Rebekah Rousi , Hooman Samani , Niko Mäkitalo , Ville Vakkuri , Simo Linkola , Kai-Kristian Kemell , Paulius Daubaris , Ilenia Fronza , Tommi Mikkonen , Pekka AbrahamssonComments: 15 pages, 4 figures, International Conference on Software BusinessSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Business and technology are intricately connected through logic and design. They are equally sensitive to societal changes and may be devastated by scandal. Cooperative multi-robot systems (MRSs) are on the rise, allowing robots of different types and brands to work together in diverse contexts. Generative artificial intelligence has been a dominant topic in recent artificial intelligence (AI) discussions due to its capacity to mimic humans through the use of natural language and the production of media, including deep fakes. In this article, we focus specifically on the conversational aspects of generative AI, and hence use the term Conversational Generative artificial intelligence (CGI). Like MRSs, CGIs have enormous potential for revolutionizing processes across sectors and transforming the way humans conduct business. From a business perspective, cooperative MRSs alone, with potential conflicts of interest, privacy practices, and safety concerns, require ethical examination. MRSs empowered by CGIs demand multi-dimensional and sophisticated methods to uncover imminent ethical pitfalls. This study focuses on ethics in CGI-empowered MRSs while reporting the stages of developing the MORUL model.
- [1046] arXiv:2401.09476 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: A Framework for Agricultural Food Supply Chain using BlockchainAuthors: Sudarssan NComments: 5 Pages, 5 figures, Under ReviewSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The main aim of the paper is to create a trust and transparency in the food supply chain system, ensuring food safety for everyone with the help of Blockchain Technology. Food supply chain is the process of tracing a crop from the farmer or producer to the buyer. With the advent of blockchain, providing a safe and fraud-free environment for the provision of numerous agricultural necessities has become much easier. Because of the globalization of trade, the present supply chain market today includes various companies involving integration of data, complex transactions and distribution. Information tamper resistance, supply-demand relationships, and traceable oversight are all difficulties that arise as a result of this. Blockchain is a distributed ledger technology that can provide information that is resistant to tampering. This strategy can eliminate the need for a centralized trusted authority, intermediaries, and business histories, allowing for increased production and security while maintaining the highest levels of integrity, liability, and safety. In order to have an integrity and transparency in food supply chain in the agricultural sector, a framework is proposed here based on block chain and IoT.
- [1047] arXiv:2401.09479 (cross-list from cs.CR) [ pdf , other ]
-
Title: Uncertainty-Aware Hardware Trojan Detection Using Multimodal Deep LearningComments: 2024 Design, Automation and Test in Europe Conference | The European Event for Electronic System Design & Test (accepted)Journal-ref: 2024 Design, Automation and Test in Europe Conference | The European Event for Electronic System Design & TestSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The risk of hardware Trojans being inserted at various stages of chip production has increased in a zero-trust fabless era. To counter this, various machine learning solutions have been developed for the detection of hardware Trojans. While most of the focus has been on either a statistical or deep learning approach, the limited number of Trojan-infected benchmarks affects the detection accuracy and restricts the possibility of detecting zero-day Trojans. To close the gap, we first employ generative adversarial networks to amplify our data in two alternative representation modalities, a graph and a tabular, ensuring that the dataset is distributed in a representative manner. Further, we propose a multimodal deep learning approach to detect hardware Trojans and evaluate the results from both early fusion and late fusion strategies. We also estimate the uncertainty quantification metrics of each prediction for risk-aware decision-making. The outcomes not only confirms the efficacy of our proposed hardware Trojan detection method but also opens a new door for future studies employing multimodality and uncertainty quantification to address other hardware security challenges.
- [1048] arXiv:2401.09489 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: PUPAE: Intuitive and Actionable Explanations for Time Series AnomaliesAuthors: Audrey Der , Chin-Chia Michael Yeh , Yan Zheng , Junpeng Wang , Zhongfang Zhuang , Liang Wang , Wei Zhang , Eamonn J. KeoghComments: 9 Page Manuscript, 1 Page Supplementary (Supplement not published in conference proceedings.)Journal-ref: SIAM SDM 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In recent years there has been significant progress in time series anomaly detection. However, after detecting an (perhaps tentative) anomaly, can we explain it? Such explanations would be useful to triage anomalies. For example, in an oil refinery, should we respond to an anomaly by dispatching a hydraulic engineer, or an intern to replace the battery on a sensor? There have been some parallel efforts to explain anomalies, however many proposed techniques produce explanations that are indirect, and often seem more complex than the anomaly they seek to explain. Our review of the literature/checklists/user-manuals used by frontline practitioners in various domains reveals an interesting near-universal commonality. Most practitioners discuss, explain and report anomalies in the following format: The anomaly would be like normal data A, if not for the corruption B. The reader will appreciate that is a type of counterfactual explanation. In this work we introduce a domain agnostic counterfactual explanation technique to produce explanations for time series anomalies. As we will show, our method can produce both visual and text-based explanations that are objectively correct, intuitive and in many circumstances, directly actionable.
- [1049] arXiv:2401.09498 (cross-list from cs.LG) [ pdf , other ]
-
Title: Technical Report: On the Convergence of Gossip Learning in the Presence of Node InaccessibilitySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Gossip learning (GL), as a decentralized alternative to federated learning (FL), is more suitable for resource-constrained wireless networks, such as Flying Ad-Hoc Networks (FANETs) that are formed by unmanned aerial vehicles (UAVs). GL can significantly enhance the efficiency and extend the battery life of UAV networks. Despite the advantages, the performance of GL is strongly affected by data distribution, communication speed, and network connectivity. However, how these factors influence the GL convergence is still unclear. Existing work studied the convergence of GL based on a virtual quantity for the sake of convenience, which failed to reflect the real state of the network when some nodes are inaccessible. In this paper, we formulate and investigate the impact of inaccessible nodes to GL under a dynamic network topology. We first decompose the weight divergence by whether the node is accessible or not. Then, we investigate the GL convergence under the dynamic of node accessibility and theoretically provide how the number of inaccessible nodes, data non-i.i.d.-ness, and duration of inaccessibility affect the convergence. Extensive experiments are carried out in practical settings to comprehensively verify the correctness of our theoretical findings.
- [1050] arXiv:2401.09516 (cross-list from cs.LG) [ pdf , other ]
-
Title: Accelerating Data Generation for Neural Operators via Krylov Subspace RecyclingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Abstract: Learning neural operators for solving partial differential equations (PDEs) has attracted great attention due to its high inference efficiency. However, training such operators requires generating a substantial amount of labeled data, i.e., PDE problems together with their solutions. The data generation process is exceptionally time-consuming, as it involves solving numerous systems of linear equations to obtain numerical solutions to the PDEs. Many existing methods solve these systems independently without considering their inherent similarities, resulting in extremely redundant computations. To tackle this problem, we propose a novel method, namely Sorting Krylov Recycling (SKR), to boost the efficiency of solving these systems, thus significantly accelerating data generation for neural operators training. To the best of our knowledge, SKR is the first attempt to address the time-consuming nature of data generation for learning neural operators. The working horse of SKR is Krylov subspace recycling, a powerful technique for solving a series of interrelated systems by leveraging their inherent similarities. Specifically, SKR employs a sorting algorithm to arrange these systems in a sequence, where adjacent systems exhibit high similarities. Then it equips a solver with Krylov subspace recycling to solve the systems sequentially instead of independently, thus effectively enhancing the solving efficiency. Both theoretical analysis and extensive experiments demonstrate that SKR can significantly accelerate neural operator data generation, achieving a remarkable speedup of up to 13.9 times.
- [1051] arXiv:2401.09553 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: BERTologyNavigator: Advanced Question Answering with BERT-based SemanticsAuthors: Shreya Rajpal (1,2), Ricardo Usbeck (1) ((1) Universität Hamburg, Hamburg, Germany,(2) Vellore Institute of Technology, Vellore, Tamil Nadu, India)Comments: Accepted in Scholarly QALD Challenge @ ISWC 2023Journal-ref: Joint Proceedings of Scholarly QALD 2023 and SemREC 2023 co-located with 22nd International Semantic Web Conference ISWC 2023. Athens, Greece, November 6-10, 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The development and integration of knowledge graphs and language models has significance in artificial intelligence and natural language processing. In this study, we introduce the BERTologyNavigator -- a two-phased system that combines relation extraction techniques and BERT embeddings to navigate the relationships within the DBLP Knowledge Graph (KG). Our approach focuses on extracting one-hop relations and labelled candidate pairs in the first phases. This is followed by employing BERT's CLS embeddings and additional heuristics for relation selection in the second phase. Our system reaches an F1 score of 0.2175 on the DBLP QuAD Final test dataset for Scholarly QALD and 0.98 F1 score on the subset of the DBLP QuAD test dataset during the QA phase.
- [1052] arXiv:2401.09555 (cross-list from cs.LG) [ pdf , other ]
-
Title: Improving Classification Performance With Human Feedback: Label a few, we label the restSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In the realm of artificial intelligence, where a vast majority of data is unstructured, obtaining substantial amounts of labeled data to train supervised machine learning models poses a significant challenge. To address this, we delve into few-shot and active learning, where are goal is to improve AI models with human feedback on a few labeled examples. This paper focuses on understanding how a continuous feedback loop can refine models, thereby enhancing their accuracy, recall, and precision through incremental human input. By employing Large Language Models (LLMs) such as GPT-3.5, BERT, and SetFit, we aim to analyze the efficacy of using a limited number of labeled examples to substantially improve model accuracy. We benchmark this approach on the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to prove that with just a few labeled examples, we are able to surpass the accuracy of zero shot large language models to provide enhanced text classification performance. We demonstrate that rather than needing to manually label millions of rows of data, we just need to label a few and the model can effectively predict the rest.
- [1053] arXiv:2401.09566 (cross-list from cs.CL) [ pdf , other ]
-
Title: Aligning Large Language Models with Counterfactual DPOAuthors: Bradley ButcherSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Advancements in large language models (LLMs) have demonstrated remarkable capabilities across a diverse range of applications. These models excel in generating text completions that are contextually coherent and cover an extensive array of subjects. However, the vast datasets required for their training make aligning response styles during the pretraining and instruction tuning phases challenging. Consequently, an additional alignment phase is typically employed, wherein the model is further trained with human preference data to better align its outputs with human expectations. While this process doesn't introduce new capabilities per se, it does accentuate generation styles innate to the model. This paper explores the utilization of counterfactual prompting within the framework of Direct Preference Optimization (DPO) to align the model's style without relying on human intervention. We demonstrate that this method effectively instils desirable behaviour, mitigates undesirable ones, and encourages the model to disregard inappropriate instructions. Our findings suggest that counterfactual prompting with DPO presents a low-resource way to fine-tune LLMs to meet the demands for responsible and ethically aligned AI systems.
- [1054] arXiv:2401.09572 (cross-list from cs.IR) [ pdf , other ]
-
Title: Handling Large-scale Cardinality in building recommendation systemsSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Effective recommendation systems rely on capturing user preferences, often requiring incorporating numerous features such as universally unique identifiers (UUIDs) of entities. However, the exceptionally high cardinality of UUIDs poses a significant challenge in terms of model degradation and increased model size due to sparsity. This paper presents two innovative techniques to address the challenge of high cardinality in recommendation systems. Specifically, we propose a bag-of-words approach, combined with layer sharing, to substantially decrease the model size while improving performance. Our techniques were evaluated through offline and online experiments on Uber use cases, resulting in promising results demonstrating our approach's effectiveness in optimizing recommendation systems and enhancing their overall performance.
- [1055] arXiv:2401.09615 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Learning Shortcuts: On the Misleading Promise of NLU in Language ModelsComments: Accepted at HICSS-SDPS 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The advent of large language models (LLMs) has enabled significant performance gains in the field of natural language processing. However, recent studies have found that LLMs often resort to shortcuts when performing tasks, creating an illusion of enhanced performance while lacking generalizability in their decision rules. This phenomenon introduces challenges in accurately assessing natural language understanding in LLMs. Our paper provides a concise survey of relevant research in this area and puts forth a perspective on the implications of shortcut learning in the evaluation of language models, specifically for NLU tasks. This paper urges more research efforts to be put towards deepening our comprehension of shortcut learning, contributing to the development of more robust language models, and raising the standards of NLU evaluation in real-world scenarios.
- [1056] arXiv:2401.09637 (cross-list from cs.HC) [ pdf , other ]
-
Title: Impact of Large Language Model Assistance on Patients Reading Clinical Notes: A Mixed-Methods StudyAuthors: Niklas Mannhardt , Elizabeth Bondi-Kelly , Barbara Lam , Chloe O'Connell , Mercy Asiedu , Hussein Mozannar , Monica Agrawal , Alejandro Buendia , Tatiana Urman , Irbaz B. Riaz , Catherine E. Ricciardi , Marzyeh Ghassemi , David SontagSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Patients derive numerous benefits from reading their clinical notes, including an increased sense of control over their health and improved understanding of their care plan. However, complex medical concepts and jargon within clinical notes hinder patient comprehension and may lead to anxiety. We developed a patient-facing tool to make clinical notes more readable, leveraging large language models (LLMs) to simplify, extract information from, and add context to notes. We prompt engineered GPT-4 to perform these augmentation tasks on real clinical notes donated by breast cancer survivors and synthetic notes generated by a clinician, a total of 12 notes with 3868 words. In June 2023, 200 female-identifying US-based participants were randomly assigned three clinical notes with varying levels of augmentations using our tool. Participants answered questions about each note, evaluating their understanding of follow-up actions and self-reported confidence. We found that augmentations were associated with a significant increase in action understanding score (0.63 $\pm$ 0.04 for select augmentations, compared to 0.54 $\pm$ 0.02 for the control) with p=0.002. In-depth interviews of self-identifying breast cancer patients (N=7) were also conducted via video conferencing. Augmentations, especially definitions, elicited positive responses among the seven participants, with some concerns about relying on LLMs. Augmentations were evaluated for errors by clinicians, and we found misleading errors occur, with errors more common in real donated notes than synthetic notes, illustrating the importance of carefully written clinical notes. Augmentations improve some but not all readability metrics. This work demonstrates the potential of LLMs to improve patients' experience with clinical notes at a lower burden to clinicians. However, having a human in the loop is important to correct potential model errors.
- [1057] arXiv:2401.09640 (cross-list from eess.SY) [ pdf , ps , other ]
-
Title: Blackout Mitigation via Physics-guided RLSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI)
Abstract: This paper considers the sequential design of remedial control actions in response to system anomalies for the ultimate objective of preventing blackouts. A physics-guided reinforcement learning (RL) framework is designed to identify effective sequences of real-time remedial look-ahead decisions accounting for the long-term impact on the system's stability. The paper considers a space of control actions that involve both discrete-valued transmission line-switching decisions (line reconnections and removals) and continuous-valued generator adjustments. To identify an effective blackout mitigation policy, a physics-guided approach is designed that uses power-flow sensitivity factors associated with the power transmission network to guide the RL exploration during agent training. Comprehensive empirical evaluations using the open-source Grid2Op platform demonstrate the notable advantages of incorporating physical signals into RL decisions, establishing the gains of the proposed physics-guided approach compared to its black box counterparts. One important observation is that strategically~\emph{removing} transmission lines, in conjunction with multiple real-time generator adjustments, often renders effective long-term decisions that are likely to prevent or delay blackouts.
- [1058] arXiv:2401.09646 (cross-list from cs.LG) [ pdf , other ]
-
Title: ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate ChangeAuthors: David Thulke , Yingbo Gao , Petrus Pelser , Rein Brune , Rricha Jalota , Floris Fok , Michael Ramos , Ian van Wyk , Abdallah Nasir , Hayden Goldstein , Taylor Tragemann , Katie Nguyen , Ariana Fowler , Andrew Stanco , Jon Gabriel , Jordan Taylor , Dean Moro , Evgenii Tsymbalov , Juliette de Waal , Evgeny Matusov , Mudar Yaghi , Mohammad Shihadah , Hermann Ney , Christian Dugast , Jonathan Dotan , Daniel ErasmusSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This paper introduces ClimateGPT, a model family of domain-specific large language models that synthesize interdisciplinary research on climate change. We trained two 7B models from scratch on a science-oriented dataset of 300B tokens. For the first model, the 4.2B domain-specific tokens were included during pre-training and the second was adapted to the climate domain after pre-training. Additionally, ClimateGPT-7B, 13B and 70B are continuously pre-trained from Llama~2 on a domain-specific dataset of 4.2B tokens. Each model is instruction fine-tuned on a high-quality and human-generated domain-specific dataset that has been created in close cooperation with climate scientists. To reduce the number of hallucinations, we optimize the model for retrieval augmentation and propose a hierarchical retrieval strategy. To increase the accessibility of our model to non-English speakers, we propose to make use of cascaded machine translation and show that this approach can perform comparably to natively multilingual models while being easier to scale to a large number of languages. Further, to address the intrinsic interdisciplinary aspect of climate change we consider different research perspectives. Therefore, the model can produce in-depth answers focusing on different perspectives in addition to an overall answer. We propose a suite of automatic climate-specific benchmarks to evaluate LLMs. On these benchmarks, ClimateGPT-7B performs on par with the ten times larger Llama-2-70B Chat model while not degrading results on general domain benchmarks. Our human evaluation confirms the trends we saw in our benchmarks. All models were trained and evaluated using renewable energy and are released publicly.
- [1059] arXiv:2401.09651 (cross-list from cs.LG) [ pdf , other ]
-
Title: Convex and Bilevel Optimization for Neuro-Symbolic Inference and LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: We address a key challenge for neuro-symbolic (NeSy) systems by leveraging convex and bilevel optimization techniques to develop a general gradient-based framework for end-to-end neural and symbolic parameter learning. The applicability of our framework is demonstrated with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across $8$ datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods.
- [1060] arXiv:2401.09656 (cross-list from cs.LG) [ pdf , other ]
-
Title: Mobility Accelerates Learning: Convergence Analysis on Hierarchical Federated Learning in Vehicular NetworksComments: Submitted to IEEE for possible publicationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Hierarchical federated learning (HFL) enables distributed training of models across multiple devices with the help of several edge servers and a cloud edge server in a privacy-preserving manner. In this paper, we consider HFL with highly mobile devices, mainly targeting at vehicular networks. Through convergence analysis, we show that mobility influences the convergence speed by both fusing the edge data and shuffling the edge models. While mobility is usually considered as a challenge from the perspective of communication, we prove that it increases the convergence speed of HFL with edge-level heterogeneous data, since more diverse data can be incorporated. Furthermore, we demonstrate that a higher speed leads to faster convergence, since it accelerates the fusion of data. Simulation results show that mobility increases the model accuracy of HFL by up to 15.1% when training a convolutional neural network on the CIFAR-10 dataset.
- [1061] arXiv:2401.09666 (cross-list from eess.SY) [ pdf , other ]
-
Title: Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory DataAuthors: Nathan Lichtlé , Kathy Jang , Adit Shah , Eugene Vinitsky , Jonathan W. Lee , Alexandre M. BayenComments: Accepted to be published as part of the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC) 2023, Bilbao, Spain, September 24-28, 2023Subjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Designing traffic-smoothing cruise controllers that can be deployed onto autonomous vehicles is a key step towards improving traffic flow, reducing congestion, and enhancing fuel efficiency in mixed autonomy traffic. We bypass the common issue of having to carefully fine-tune a large traffic microsimulator by leveraging real-world trajectory data from the I-24 highway in Tennessee, replayed in a one-lane simulation. Using standard deep reinforcement learning methods, we train energy-reducing wave-smoothing policies. As an input to the agent, we observe the speed and distance of only the vehicle in front, which are local states readily available on most recent vehicles, as well as non-local observations about the downstream state of the traffic. We show that at a low 4% autonomous vehicle penetration rate, we achieve significant fuel savings of over 15% on trajectories exhibiting many stop-and-go waves. Finally, we analyze the smoothing effect of the controllers and demonstrate robustness to adding lane-changing into the simulation as well as the removal of downstream information.
- [1062] arXiv:2401.09671 (cross-list from cs.LG) [ pdf , other ]
-
Title: Towards Identifiable Unsupervised Domain Translation: A Diversified Distribution Matching ApproachSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Unsupervised domain translation (UDT) aims to find functions that convert samples from one domain (e.g., sketches) to another domain (e.g., photos) without changing the high-level semantic meaning (also referred to as ``content''). The translation functions are often sought by probability distribution matching of the transformed source domain and target domain. CycleGAN stands as arguably the most representative approach among this line of work. However, it was noticed in the literature that CycleGAN and variants could fail to identify the desired translation functions and produce content-misaligned translations. This limitation arises due to the presence of multiple translation functions -- referred to as ``measure-preserving automorphism" (MPA) -- in the solution space of the learning criteria. Despite awareness of such identifiability issues, solutions have remained elusive. This study delves into the core identifiability inquiry and introduces an MPA elimination theory. Our analysis shows that MPA is unlikely to exist, if multiple pairs of diverse cross-domain conditional distributions are matched by the learning function. Our theory leads to a UDT learner using distribution matching over auxiliary variable-induced subsets of the domains -- other than over the entire data domains as in the classical approaches. The proposed framework is the first to rigorously establish translation identifiability under reasonable UDT settings, to our best knowledge. Experiments corroborate with our theoretical claims.
- [1063] arXiv:2401.09691 (cross-list from cs.RO) [ pdf , other ]
-
Title: Imitation Learning Inputting Image Feature to Each Layer of Neural NetworkComments: 6 pages, 4 figures, Accepted at AMC2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Imitation learning enables robots to learn and replicate human behavior from training data. Recent advances in machine learning enable end-to-end learning approaches that directly process high-dimensional observation data, such as images. However, these approaches face a critical challenge when processing data from multiple modalities, inadvertently ignoring data with a lower correlation to the desired output, especially when using short sampling periods. This paper presents a useful method to address this challenge, which amplifies the influence of data with a relatively low correlation to the output by inputting the data into each neural network layer. The proposed approach effectively incorporates diverse data sources into the learning process. Through experiments using a simple pick-and-place operation with raw images and joint information as input, significant improvements in success rates are demonstrated even when dealing with data from short sampling periods.
- [1064] arXiv:2401.09695 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Should ChatGPT Write Your Breakup Text? Exploring the Role of AI in Relationship DissolutionSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Relationships are essential to our happiness and wellbeing. The dissolution of a relationship, the final stage of relationship's lifecycle and one of the most stressful events in an individual's life, can have profound and long-lasting impacts on people. With the breakup process increasingly facilitated by computer-mediated communication (CMC), and the likely future influence of AI-mediated communication (AIMC) tools, we conducted a semi-structured interview study with 21 participants. We aim to understand: 1) the current role of technology in the breakup process, 2) the needs and support individuals have during the process, and 3) how AI might address these needs. Our research shows that people have distinct needs at various stages of ending a relationship. Presently, technology is used for information gathering and community support, acting as a catalyst for breakups, enabling ghosting and blocking, and facilitating communication. Participants anticipate that AI could aid in sense-making of their relationship leading up to the breakup, act as a mediator, assist in crafting appropriate wording, tones, and language during breakup conversations, and support companionship, reflection, recovery, and growth after a breakup. Our findings also demonstrate an overlap between the breakup process and the Transtheoretical Model (TTM) of behavior change. Through the lens of TTM, we explore the potential support and affordances AI could offer in breakups, including its benefits and the necessary precautions regarding AI's role in this sensitive process.
- [1065] arXiv:2401.09699 (cross-list from cs.CL) [ pdf , other ]
-
Title: Curriculum Recommendations Using Transformer Base Model with InfoNCE Loss And Language Switching MethodComments: 4pages, 2 figures, ICAICA2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The Curriculum Recommendations paradigm is dedicated to fostering learning equality within the ever-evolving realms of educational technology and curriculum development. In acknowledging the inherent obstacles posed by existing methodologies, such as content conflicts and disruptions from language translation, this paradigm aims to confront and overcome these challenges. Notably, it addresses content conflicts and disruptions introduced by language translation, hindrances that can impede the creation of an all-encompassing and personalized learning experience. The paradigm's objective is to cultivate an educational environment that not only embraces diversity but also customizes learning experiences to suit the distinct needs of each learner. To overcome these challenges, our approach builds upon notable contributions in curriculum development and personalized learning, introducing three key innovations. These include the integration of Transformer Base Model to enhance computational efficiency, the implementation of InfoNCE Loss for accurate content-topic matching, and the adoption of a language switching strategy to alleviate translation-related ambiguities. Together, these innovations aim to collectively tackle inherent challenges and contribute to forging a more equitable and effective learning journey for a diverse range of learners. Competitive cross-validation scores underscore the efficacy of sentence-transformers/LaBSE, achieving 0.66314, showcasing our methodology's effectiveness in diverse linguistic nuances for content alignment prediction. Index Terms-Curriculum Recommendation, Transformer model with InfoNCE Loss, Language Switching.
- [1066] arXiv:2401.09716 (cross-list from cs.CV) [ pdf , other ]
-
Title: HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain GeneralizationAuthors: Guanglin Zhou , Zhongyi Han , Shiming Chen , Biwei Huang , Liming Zhu , Tongliang Liu , Lina Yao , Kun ZhangSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to a fixed structure or uniform parameterization to encapsulate invariant features can inadvertently blend specific aspects. Such an approach struggles with nuanced differentiation of inter-domain variations and may exhibit bias towards certain domains, hindering the precise learning of domain-invariant features. Recognizing this, we introduce a novel method designed to supplement the model with domain-level and task-specific characteristics. This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization. Building on the emerging trend of visual prompts in the DG paradigm, our work introduces the novel \textbf{H}ierarchical \textbf{C}ontrastive \textbf{V}isual \textbf{P}rompt (HCVP) methodology. This represents a significant advancement in the field, setting itself apart with a unique generative approach to prompts, alongside an explicit model structure and specialized loss functions. Differing from traditional visual prompts that are often shared across entire datasets, HCVP utilizes a hierarchical prompt generation network enhanced by prompt contrastive learning. These generative prompts are instance-dependent, catering to the unique characteristics inherent to different domains and tasks. Additionally, we devise a prompt modulation network that serves as a bridge, effectively incorporating the generated visual prompts into the vision transformer backbone. Experiments conducted on five DG datasets demonstrate the effectiveness of HCVP, outperforming both established DG algorithms and adaptation protocols.
- [1067] arXiv:2401.09748 (cross-list from cs.SC) [ pdf , other ]
-
Title: Bootstrapping OTS-Funcimg Pre-training Model (Botfip) -- A Comprehensive Symbolic Regression FrameworkSubjects: Symbolic Computation (cs.SC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the field of scientific computing, many problem-solving approaches tend to focus only on the process and final outcome, even in AI for science, there is a lack of deep multimodal information mining behind the data, missing a multimodal framework akin to that in the image-text domain. In this paper, we take Symbolic Regression(SR) as our focal point and, drawing inspiration from the BLIP model in the image-text domain, propose a scientific computing multimodal framework based on Function Images (Funcimg) and Operation Tree Sequence (OTS), named Bootstrapping OTS-Funcimg Pre-training Model (Botfip). In SR experiments, we validate the advantages of Botfip in low-complexity SR problems, showcasing its potential. As a MED framework, Botfip holds promise for future applications in a broader range of scientific computing problems.
- [1068] arXiv:2401.09756 (cross-list from cs.LG) [ pdf , other ]
-
Title: Explaining Drift using Shapley ValuesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Machine learning models often deteriorate in their performance when they are used to predict the outcomes over data on which they were not trained. These scenarios can often arise in real world when the distribution of data changes gradually or abruptly due to major events like a pandemic. There have been many attempts in machine learning research to come up with techniques that are resilient to such Concept drifts. However, there is no principled framework to identify the drivers behind the drift in model performance. In this paper, we propose a novel framework - DBShap that uses Shapley values to identify the main contributors of the drift and quantify their respective contributions. The proposed framework not only quantifies the importance of individual features in driving the drift but also includes the change in the underlying relation between the input and output as a possible driver. The explanation provided by DBShap can be used to understand the root cause behind the drift and use it to make the model resilient to the drift.
- [1069] arXiv:2401.09757 (cross-list from cs.IT) [ pdf , other ]
-
Title: Cooperative Tri-Point Model-Based Ground-to-Air Coverage Extension in Beyond 5G NetworksSubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI)
Abstract: The utilization of existing terrestrial infrastructures to provide coverage for aerial users is a potentially low-cost solution. However, the already deployed terrestrial base stations (TBSs) result in weak ground-to-air (G2A) coverage due to the down-tilted antennas. Furthermore, achieving optimal coverage across the entire airspace through antenna adjustment is challenging due to the complex signal coverage requirements in three-dimensional space, especially in the vertical direction. In this paper, we propose a cooperative tri-point (CoTP) model-based method that utilizes cooperative beams to enhance the G2A coverage extension. To utilize existing TBSs for establishing effective cooperation, we prove that the cooperation among three TBSs can ensure G2A coverage with a minimum coverage overlap, and design the CoTP model to analyze the G2A coverage extension. Using the model, a cooperative coverage structure based on Delaunay triangulation is designed to divide triangular prism-shaped subspaces and corresponding TBS cooperation sets. To enable TBSs in the cooperation set to cover different height subspaces while maintaining ground coverage, we design a cooperative beam generation algorithm to maximize the coverage in the triangular prism-shaped airspace. The simulation results and field trials demonstrate that the proposed method can efficiently enhance the G2A coverage extension while guaranteeing ground coverage.
- [1070] arXiv:2401.09763 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: CLIP Model for Images to Textual Prompts Based on Top-k NeighborsComments: CLIP model, KNN, image-to-promptsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Text-to-image synthesis, a subfield of multimodal generation, has gained significant attention in recent years. We propose a cost-effective approach for image-to-prompt generation that leverages generative models to generate textual prompts without the need for large amounts of annotated data. We divide our method into two stages: online stage and offline stage. We use a combination of the CLIP model and K-nearest neighbors (KNN) algorithm. The proposed system consists of two main parts: an offline task and an online task. Our method owns the highest metric 0.612 among these models, which is 0.013, 0.055, 0.011 higher than Clip, Clip + KNN(top 10) respectively.
- [1071] arXiv:2401.09769 (cross-list from cs.SI) [ pdf , other ]
-
Title: Learning from Graphs with Heterophily: Progress and FutureComments: 9 pagesSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Graphs are structured data that models complex relations between real-world entities. Heterophilous graphs, where linked nodes are prone to be with different labels or dissimilar features, have recently attracted significant attention and found many applications. Meanwhile, increasing efforts have been made to advance learning from heterophilous graphs. Although there exist surveys on the relevant topic, they focus on heterophilous GNNs, which are only sub-topics of heterophilous graph learning. In this survey, we comprehensively overview existing works on learning from graphs with heterophily.First, we collect over 180 publications and introduce the development of this field. Then, we systematically categorize existing methods based on a hierarchical taxonomy including learning strategies, model architectures and practical applications. Finally, we discuss the primary challenges of existing studies and highlight promising avenues for future research.More publication details and corresponding open-source codes can be accessed and will be continuously updated at our repositories: this https URL .
- [1072] arXiv:2401.09773 (cross-list from cs.CV) [ pdf , other ]
-
Title: SEINE: Structure Encoding and Interaction Network for Nuclei Instance SegmentationComments: 10 pages, 12 figures, 6 tables, submitted to TMISubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Nuclei instance segmentation in histopathological images is of great importance for biological analysis and cancer diagnosis but remains challenging for two reasons. (1) Similar visual presentation of intranuclear and extranuclear regions of chromophobe nuclei often causes under-segmentation, and (2) current methods lack the exploration of nuclei structure, resulting in fragmented instance predictions. To address these problems, this paper proposes a structure encoding and interaction network, termed SEINE, which develops the structure modeling scheme of nuclei and exploits the structure similarity between nuclei to improve the integrality of each segmented instance. Concretely, SEINE introduces a contour-based structure encoding (SE) that considers the correlation between nuclei structure and semantics, realizing a reasonable representation of the nuclei structure. Based on the encoding, we propose a structure-guided attention (SGA) module that takes the clear nuclei as prototypes to enhance the structure learning for the fuzzy nuclei. To strengthen the structural learning ability, a semantic feature fusion (SFF) is presented to boost the semantic consistency of semantic and structure branches. Furthermore, a position enhancement (PE) method is applied to suppress incorrect nuclei boundary predictions. Extensive experiments demonstrate the superiority of our approaches, and SEINE achieves state-of-the-art (SOTA) performance on four datasets. The code is available at this https URL .
- [1073] arXiv:2401.09786 (cross-list from cs.CV) [ pdf , other ]
-
Title: Adaptive Self-training Framework for Fine-grained Scene Graph GenerationComments: 9 pages; ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be applied to any existing SGG models. Furthermore, we devise a graph structure learner (GSL) that is beneficial when adopting our proposed self-training framework to the state-of-the-art message-passing neural network (MPNN)-based SGG models. Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes.
- [1074] arXiv:2401.09787 (cross-list from cs.LG) [ pdf , other ]
-
Title: Querying Easily Flip-flopped Samples for Deep Active LearningComments: 34 pages, 17 figures, 5 tables. Accepted to the 12th International Conference on Learning Representations (ICLR 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty, but it is often intractable to compute, especially for complex decision boundaries formed in multiclass classification tasks. To address this issue, this paper proposes the {\it least disagree metric} (LDM), defined as the smallest probability of disagreement of the predicted label, and an estimator for LDM proven to be asymptotically consistent under mild assumptions. The estimator is computationally efficient and can be easily implemented for deep learning models using parameter perturbation. The LDM-based active learning is performed by querying unlabeled data with the smallest LDM. Experimental results show that our LDM-based active learning algorithm obtains state-of-the-art overall performance on all considered datasets and deep architectures.
- [1075] arXiv:2401.09795 (cross-list from cs.NE) [ pdf , other ]
-
Title: A Comparative Analysis on Metaheuristic Algorithms Based Vision Transformer Model for Early Detection of Alzheimer's DiseaseComments: 2023 IEEE 15th International Conference on Computational Intelligence and Communication Networks (CICN). arXiv admin note: text overlap with arXiv:2309.16796Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: A number of life threatening neuro-degenerative disorders had degraded the quality of life for the older generation in particular. Dementia is one such symptom which may lead to a severe condition called Alzheimer's disease if not detected at an early stage. It has been reported that the progression of such disease from a normal stage is due to the change in several parameters inside the human brain. In this paper, an innovative metaheuristic algorithms based ViT model has been proposed for the identification of dementia at different stage. A sizeable number of test data have been utilized for the validation of the proposed scheme. It has also been demonstrated that our model exhibits superior performance in terms of accuracy, precision, recall as well as F1-score.
- [1076] arXiv:2401.09798 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: All in How You Ask for It: Simple Black-Box Method for Jailbreak AttacksAuthors: Kazuhiro TakemotoComments: 12 pages, 4 figures, 3 tablesJournal-ref: Appl. Sci. 14, 3558 (2024)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Large Language Models (LLMs), such as ChatGPT, encounter `jailbreak' challenges, wherein safeguards are circumvented to generate ethically harmful prompts. This study introduces a straightforward black-box method for efficiently crafting jailbreak prompts, addressing the significant complexity and computational costs associated with conventional methods. Our technique iteratively transforms harmful prompts into benign expressions directly utilizing the target LLM, predicated on the hypothesis that LLMs can autonomously generate expressions that evade safeguards. Through experiments conducted with ChatGPT (GPT-3.5 and GPT-4) and Gemini-Pro, our method consistently achieved an attack success rate exceeding 80% within an average of five iterations for forbidden questions and proved robust against model updates. The jailbreak prompts generated were not only naturally-worded and succinct but also challenging to defend against. These findings suggest that the creation of effective jailbreak prompts is less complex than previously believed, underscoring the heightened risk posed by black-box jailbreak attacks.
- [1077] arXiv:2401.09819 (cross-list from cs.RO) [ pdf , other ]
-
Title: PPNet: A Two-Stage Neural Network for End-to-end Path PlanningSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The classical path planners, such as sampling-based path planners, can provide probabilistic completeness guarantees in the sense that the probability that the planner fails to return a solution if one exists, decays to zero as the number of samples approaches infinity. However, finding a near-optimal feasible solution in a given period is challenging in many applications such as the autonomous vehicle. To achieve an end-to-end near-optimal path planner, we first divide the path planning problem into two subproblems, which are path space segmentation and waypoints generation in the given path's space. We further propose a two-stage neural network named Path Planning Network (PPNet) each stage solves one of the subproblems abovementioned. Moreover, we propose a novel efficient data generation method for path planning named EDaGe-PP. EDaGe-PP can generate data with continuous-curvature paths with analytical expression while satisfying the clearance requirement. The results show the total computation time of generating random 2D path planning data is less than 1/33 and the success rate of PPNet trained by the dataset that is generated by EDaGe-PP is about 2 times compared to other methods. We validate PPNet against state-of-the-art path planning methods. The results show that PPNet can find a near-optimal solution in 15.3ms, which is much shorter than the state-of-the-art path planners.
- [1078] arXiv:2401.09852 (cross-list from cs.CV) [ pdf , other ]
-
Title: Enhancing the Fairness and Performance of Edge Cameras with Explainable AIAuthors: Truong Thanh Hung Nguyen , Vo Thanh Khang Nguyen , Quoc Hung Cao , Van Binh Truong , Quoc Khanh Nguyen , Hung CaoComments: IEEE ICCE 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The rising use of Artificial Intelligence (AI) in human detection on Edge camera systems has led to accurate but complex models, challenging to interpret and debug. Our research presents a diagnostic method using Explainable AI (XAI) for model debugging, with expert-driven problem identification and solution creation. Validated on the Bytetrack model in a real-world office Edge network, we found the training dataset as the main bias source and suggested model augmentation as a solution. Our approach helps identify model biases, essential for achieving fair and trustworthy models.
- [1079] arXiv:2401.09861 (cross-list from cs.CV) [ pdf , other ]
-
Title: Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language ModelsComments: 7 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have significantly enhanced the comprehension of multimedia content, bringing together diverse modalities such as text, images, and videos. However, a critical challenge faced by these models, especially when processing video inputs, is the occurrence of hallucinations - erroneous perceptions or interpretations, particularly at the event level. This study introduces an innovative method to address event-level hallucinations in MLLMs, focusing on specific temporal understanding in video content. Our approach leverages a novel framework that extracts and utilizes event-specific information from both the event query and the provided video to refine MLLMs' response. We propose a unique mechanism that decomposes on-demand event queries into iconic actions. Subsequently, we employ models like CLIP and BLIP2 to predict specific timestamps for event occurrences. Our evaluation, conducted using the Charades-STA dataset, demonstrates a significant reduction in temporal hallucinations and an improvement in the quality of event-related responses. This research not only provides a new perspective in addressing a critical limitation of MLLMs but also contributes a quantitatively measurable method for evaluating MLLMs in the context of temporal-related questions.
- [1080] arXiv:2401.09862 (cross-list from cs.NE) [ pdf , other ]
-
Title: Evolutionary Multi-Objective Optimization of Large Language Model Prompts for Balancing SentimentsComments: Accepted in EvoApps at EvoStar 2024Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The advent of large language models (LLMs) such as ChatGPT has attracted considerable attention in various domains due to their remarkable performance and versatility. As the use of these models continues to grow, the importance of effective prompt engineering has come to the fore. Prompt optimization emerges as a crucial challenge, as it has a direct impact on model performance and the extraction of relevant information. Recently, evolutionary algorithms (EAs) have shown promise in addressing this issue, paving the way for novel optimization strategies. In this work, we propose a evolutionary multi-objective (EMO) approach specifically tailored for prompt optimization called EMO-Prompts, using sentiment analysis as a case study. We use sentiment analysis capabilities as our experimental targets. Our results demonstrate that EMO-Prompts effectively generates prompts capable of guiding the LLM to produce texts embodying two conflicting emotions simultaneously.
- [1081] arXiv:2401.09865 (cross-list from cs.CV) [ pdf , other ]
-
Title: Improving fine-grained understanding in image-text pre-trainingAuthors: Ioana Bica , Anastasija Ilić , Matthias Bauer , Goker Erdogan , Matko Bošnjak , Christos Kaplanis , Alexey A. Gritsenko , Matthias Minderer , Charles Blundell , Razvan Pascanu , Jovana MitrovićComments: 26 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We introduce SPARse Fine-grained Contrastive Alignment (SPARC), a simple method for pretraining more fine-grained multimodal representations from image-text pairs. Given that multiple image patches often correspond to single words, we propose to learn a grouping of image patches for every token in the caption. To achieve this, we use a sparse similarity metric between image patches and language tokens and compute for each token a language-grouped vision embedding as the weighted average of patches. The token and language-grouped vision embeddings are then contrasted through a fine-grained sequence-wise loss that only depends on individual samples and does not require other batch samples as negatives. This enables more detailed information to be learned in a computationally inexpensive manner. SPARC combines this fine-grained loss with a contrastive loss between global image and text embeddings to learn representations that simultaneously encode global and local information. We thoroughly evaluate our proposed method and show improved performance over competing approaches both on image-level tasks relying on coarse-grained information, e.g. classification, as well as region-level tasks relying on fine-grained information, e.g. retrieval, object detection, and segmentation. Moreover, SPARC improves model faithfulness and captioning in foundational vision-language models.
- [1082] arXiv:2401.09870 (cross-list from cs.LG) [ pdf , other ]
-
Title: Reconciling Spatial and Temporal Abstractions for Goal RepresentationComments: Accepted for publication in ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Goal representation affects the performance of Hierarchical Reinforcement Learning (HRL) algorithms by decomposing the complex learning problem into easier subtasks. Recent studies show that representations that preserve temporally abstract environment dynamics are successful in solving difficult problems and provide theoretical guarantees for optimality. These methods however cannot scale to tasks where environment dynamics increase in complexity i.e. the temporally abstract transition relations depend on larger number of variables. On the other hand, other efforts have tried to use spatial abstraction to mitigate the previous issues. Their limitations include scalability to high dimensional environments and dependency on prior knowledge.
In this paper, we propose a novel three-layer HRL algorithm that introduces, at different levels of the hierarchy, both a spatial and a temporal goal abstraction. We provide a theoretical study of the regret bounds of the learned policies. We evaluate the approach on complex continuous control tasks, demonstrating the effectiveness of spatial and temporal abstractions learned by this approach. - [1083] arXiv:2401.09880 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: Attention-Based Recurrent Neural Network For Automatic Behavior Laying Hen RecognitionSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: One of the interests of modern poultry farming is the vocalization of laying hens which contain very useful information on health behavior. This information is used as health and well-being indicators that help breeders better monitor laying hens, which involves early detection of problems for rapid and more effective intervention. In this work, we focus on the sound analysis for the recognition of the types of calls of the laying hens in order to propose a robust system of characterization of their behavior for a better monitoring. To do this, we first collected and annotated laying hen call signals, then designed an optimal acoustic characterization based on the combination of time and frequency domain features. We then used these features to build the multi-label classification models based on recurrent neural network to assign a semantic class to the vocalization that characterize the laying hen behavior. The results show an overall performance with our model based on the combination of time and frequency domain features that obtained the highest F1-score (F1=92.75) with a gain of 17% on the models using the frequency domain features and of 8% on the compared approaches from the litterature.
- [1084] arXiv:2401.09886 (cross-list from cs.LG) [ pdf , other ]
-
Title: Cooperative Edge Caching Based on Elastic Federated and Multi-Agent Deep Reinforcement Learning in Next-Generation NetworkComments: This paper has been submitted to IEEE TNSM. The source code has been released at: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Edge caching is a promising solution for next-generation networks by empowering caching units in small-cell base stations (SBSs), which allows user equipments (UEs) to fetch users' requested contents that have been pre-cached in SBSs. It is crucial for SBSs to predict accurate popular contents through learning while protecting users' personal information. Traditional federated learning (FL) can protect users' privacy but the data discrepancies among UEs can lead to a degradation in model quality. Therefore, it is necessary to train personalized local models for each UE to predict popular contents accurately. In addition, the cached contents can be shared among adjacent SBSs in next-generation networks, thus caching predicted popular contents in different SBSs may affect the cost to fetch contents. Hence, it is critical to determine where the popular contents are cached cooperatively. To address these issues, we propose a cooperative edge caching scheme based on elastic federated and multi-agent deep reinforcement learning (CEFMR) to optimize the cost in the network. We first propose an elastic FL algorithm to train the personalized model for each UE, where adversarial autoencoder (AAE) model is adopted for training to improve the prediction accuracy, then {a popular} content prediction algorithm is proposed to predict the popular contents for each SBS based on the trained AAE model. Finally, we propose a multi-agent deep reinforcement learning (MADRL) based algorithm to decide where the predicted popular contents are collaboratively cached among SBSs. Our experimental results demonstrate the superiority of our proposed scheme to existing baseline caching schemes.
- [1085] arXiv:2401.09900 (cross-list from cs.CV) [ pdf , other ]
-
Title: XAI-Enhanced Semantic Segmentation Models for Visual Quality InspectionComments: IEEE ICCE 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Visual quality inspection systems, crucial in sectors like manufacturing and logistics, employ computer vision and machine learning for precise, rapid defect detection. However, their unexplained nature can hinder trust, error identification, and system improvement. This paper presents a framework to bolster visual quality inspection by using CAM-based explanations to refine semantic segmentation models. Our approach consists of 1) Model Training, 2) XAI-based Model Explanation, 3) XAI Evaluation, and 4) Annotation Augmentation for Model Enhancement, informed by explanations and expert insights. Evaluations show XAI-enhanced models surpass original DeepLabv3-ResNet101 models, especially in intricate object segmentation.
- [1086] arXiv:2401.09942 (cross-list from cs.CV) [ pdf , other ]
-
Title: Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual TrackingJournal-ref: Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports (MMSports 2023), October 29, 2023, Ottawa, ON, CanadaSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Effective tracking and re-identification of players is essential for analyzing soccer videos. But, it is a challenging task due to the non-linear motion of players, the similarity in appearance of players from the same team, and frequent occlusions. Therefore, the ability to extract meaningful embeddings to represent players is crucial in developing an effective tracking and re-identification system. In this paper, a multi-purpose part-based person representation method, called PRTreID, is proposed that performs three tasks of role classification, team affiliation, and re-identification, simultaneously. In contrast to available literature, a single network is trained with multi-task supervision to solve all three tasks, jointly. The proposed joint method is computationally efficient due to the shared backbone. Also, the multi-task learning leads to richer and more discriminative representations, as demonstrated by both quantitative and qualitative results. To demonstrate the effectiveness of PRTreID, it is integrated with a state-of-the-art tracking method, using a part-based post-processing module to handle long-term tracking. The proposed tracking method outperforms all existing tracking methods on the challenging SoccerNet tracking dataset.
- [1087] arXiv:2401.09944 (cross-list from cs.LG) [ pdf , other ]
-
Title: WindSeer: Real-time volumetric wind prediction over complex terrain aboard a small UAVAuthors: Florian Achermann , Thomas Stastny , Bogdan Danciu , Andrey Kolobov , Jen Jen Chung , Roland Siegwart , Nicholas LawranceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Real-time high-resolution wind predictions are beneficial for various applications including safe manned and unmanned aviation. Current weather models require too much compute and lack the necessary predictive capabilities as they are valid only at the scale of multiple kilometers and hours - much lower spatial and temporal resolutions than these applications require. Our work, for the first time, demonstrates the ability to predict low-altitude wind in real-time on limited-compute devices, from only sparse measurement data. We train a neural network, WindSeer, using only synthetic data from computational fluid dynamics simulations and show that it can successfully predict real wind fields over terrain with known topography from just a few noisy and spatially clustered wind measurements. WindSeer can generate accurate predictions at different resolutions and domain sizes on previously unseen topography without retraining. We demonstrate that the model successfully predicts historical wind data collected by weather stations and wind measured onboard drones.
- [1088] arXiv:2401.09964 (cross-list from cs.SE) [ pdf , other ]
-
Title: When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model InferenceComments: Accepted to ICSE24Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Leveraging recent advancements in large language models, modern neural code completion models have demonstrated the capability to generate highly accurate code suggestions. However, their massive size poses challenges in terms of computational costs and environmental impact, hindering their widespread adoption in practical scenarios. Dynamic inference emerges as a promising solution, as it allocates minimal computation during inference while maintaining the model's performance. In this research, we explore dynamic inference within the context of code completion. Initially, we conducted an empirical investigation on GPT-2, focusing on the inference capabilities of intermediate layers for code completion. We found that 54.4% of tokens can be accurately generated using just the first layer, signifying significant computational savings potential. Moreover, despite using all layers, the model still fails to predict 14.5% of tokens correctly, and the subsequent completions continued from them are rarely considered helpful, with only a 4.2% Acceptance Rate. These findings motivate our exploration of dynamic inference in code completion and inspire us to enhance it with a decision-making mechanism that stops the generation of incorrect code. We thus propose a novel dynamic inference method specifically tailored for code completion models. This method aims not only to produce correct predictions with largely reduced computation but also to prevent incorrect predictions proactively. Our extensive evaluation shows that it can averagely skip 1.7 layers out of 16 layers in the models, leading to an 11.2% speedup with only a marginal 1.1% reduction in ROUGE-L.
- [1089] arXiv:2401.09983 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Multiobjective Optimization Analysis for Finding Infrastructure-as-Code Deployment ConfigurationsAuthors: Eneko Osaba , Josu Diaz-de-Arcaya , Juncal Alonso , Jesus L. Lobo , Gorka Benguria , Iñaki EtxanizComments: 9 pages, 1 figure, 4 tables. Paper presented in the 11th International Conference on Computer and Communications Management (ICCCM 2023)Subjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Multiobjective optimization is a hot topic in the artificial intelligence and operations research communities. The design and development of multiobjective methods is a frequent task for researchers and practitioners. As a result of this vibrant activity, a myriad of techniques have been proposed in the literature to date, demonstrating a significant effectiveness for dealing with situations coming from a wide range of real-world areas. This paper is focused on a multiobjective problem related to optimizing Infrastructure-as-Code deployment configurations. The system implemented for solving this problem has been coined as IaC Optimizer Platform (IOP). Despite the fact that a prototypical version of the IOP has been introduced in the literature before, a deeper analysis focused on the resolution of the problem is needed, in order to determine which is the most appropriate multiobjective method for embedding in the IOP. The main motivation behind the analysis conducted in this work is to enhance the IOP performance as much as possible. This is a crucial aspect of this system, deeming that it will be deployed in a real environment, as it is being developed as part of a H2020 European project. Going deeper, we resort in this paper to nine different evolutionary computation-based multiobjective algorithms. For assessing the quality of the considered solvers, 12 different problem instances have been generated based on real-world settings. Results obtained by each method after 10 independent runs have been compared using Friedman's non-parametric tests. Findings reached from the tests carried out lad to the creation of a multi-algorithm system, capable of applying different techniques according to the user's needs.
- [1090] arXiv:2401.09986 (cross-list from cs.LG) [ pdf , other ]
-
Title: FLex&Chill: Improving Local Federated Learning Training with Logit ChillingComments: 9 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Federated learning are inherently hampered by data heterogeneity: non-iid distributed training data over local clients. We propose a novel model training approach for federated learning, FLex&Chill, which exploits the Logit Chilling method. Through extensive evaluations, we demonstrate that, in the presence of non-iid data characteristics inherent in federated learning systems, this approach can expedite model convergence and improve inference accuracy. Quantitatively, from our experiments, we observe up to 6X improvement in the global federated learning model convergence time, and up to 3.37% improvement in inference accuracy.
- [1091] arXiv:2401.09987 (cross-list from cs.RO) [ pdf , other ]
-
Title: A-KIT: Adaptive Kalman-Informed TransformerSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: The extended Kalman filter (EKF) is a widely adopted method for sensor fusion in navigation applications. A crucial aspect of the EKF is the online determination of the process noise covariance matrix reflecting the model uncertainty. While common EKF implementation assumes a constant process noise, in real-world scenarios, the process noise varies, leading to inaccuracies in the estimated state and potentially causing the filter to diverge. To cope with such situations, model-based adaptive EKF methods were proposed and demonstrated performance improvements, highlighting the need for a robust adaptive approach. In this paper, we derive and introduce A-KIT, an adaptive Kalman-informed transformer to learn the varying process noise covariance online. The A-KIT framework is applicable to any type of sensor fusion. Here, we present our approach to nonlinear sensor fusion based on an inertial navigation system and Doppler velocity log. By employing real recorded data from an autonomous underwater vehicle, we show that A-KIT outperforms the conventional EKF by more than 49.5% and model-based adaptive EKF by an average of 35.4% in terms of position accuracy.
- [1092] arXiv:2401.10016 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Gender Bias in Machine Translation and The Era of Large Language ModelsAuthors: Eva VanmassenhoveComments: 24 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: This chapter examines the role of Machine Translation in perpetuating gender bias, highlighting the challenges posed by cross-linguistic settings and statistical dependencies. A comprehensive overview of relevant existing work related to gender bias in both conventional Neural Machine Translation approaches and Generative Pretrained Transformer models employed as Machine Translation systems is provided. Through an experiment using ChatGPT (based on GPT-3.5) in an English-Italian translation context, we further assess ChatGPT's current capacity to address gender bias. The findings emphasize the ongoing need for advancements in mitigating bias in Machine Translation systems and underscore the importance of fostering fairness and inclusivity in language technologies.
- [1093] arXiv:2401.10019 (cross-list from cs.CL) [ pdf , other ]
-
Title: R-Judge: Benchmarking Safety Risk Awareness for LLM AgentsAuthors: Tongxin Yuan , Zhiwei He , Lingzhong Dong , Yiming Wang , Ruijie Zhao , Tian Xia , Lizhen Xu , Binglin Zhou , Fangqi Li , Zhuosheng Zhang , Rui Wang , Gongshen LiuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have exhibited great potential in autonomously completing tasks across real-world applications. Despite this, these LLM agents introduce unexpected safety risks when operating in interactive environments. Instead of centering on LLM-generated content safety in most prior studies, this work addresses the imperative need for benchmarking the behavioral safety of LLM agents within diverse environments. We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records. R-Judge comprises 162 records of multi-turn agent interaction, encompassing 27 key risk scenarios among 7 application categories and 10 risk types. It incorporates human consensus on safety with annotated safety labels and high-quality risk descriptions. Evaluation of 9 LLMs on R-Judge shows considerable room for enhancing the risk awareness of LLMs: The best-performing model, GPT-4, achieves 72.52% in contrast to the human score of 89.07%, while all other models score less than the random. Moreover, further experiments demonstrate that leveraging risk descriptions as environment feedback achieves substantial performance gains. With case studies, we reveal that correlated to parameter amount, risk awareness in open agent scenarios is a multi-dimensional capability involving knowledge and reasoning, thus challenging for current LLMs. R-Judge is publicly available at this https URL .
- [1094] arXiv:2401.10020 (cross-list from cs.CL) [ pdf , other ]
-
Title: Self-Rewarding Language ModelsAuthors: Weizhe Yuan , Richard Yuanzhe Pang , Kyunghyun Cho , Xian Li , Sainbayar Sukhbaatar , Jing Xu , Jason WestonSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human preferences, which may then be bottlenecked by human performance level, and secondly these separate frozen reward models cannot then learn to improve during LLM training. In this work, we study Self-Rewarding Language Models, where the language model itself is used via LLM-as-a-Judge prompting to provide its own rewards during training. We show that during Iterative DPO training that not only does instruction following ability improve, but also the ability to provide high-quality rewards to itself. Fine-tuning Llama 2 70B on three iterations of our approach yields a model that outperforms many existing systems on the AlpacaEval 2.0 leaderboard, including Claude 2, Gemini Pro, and GPT-4 0613. While there is much left still to explore, this work opens the door to the possibility of models that can continually improve in both axes.
- [1095] arXiv:2401.10034 (cross-list from cs.NE) [ pdf , other ]
-
Title: Evolutionary Computation in the Era of Large Language Model: Survey and RoadmapComments: evolutionary algorithm (EA), large language model (LLM), optimization problem, prompt optimization, architecture search, code generationSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Large Language Models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and Evolutionary Algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced LLM. Some integrated synergy methods are further introduced to exemplify the amalgamation of LLMs and EAs in diverse scenarios, including neural architecture search, code generation, software engineering, and various generation tasks. As the first comprehensive review focused on the EA research in the era of LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. By meticulous categorization and critical analysis, we contribute to the ongoing discourse on the cross-disciplinary study of these two powerful paradigms. The identified challenges and future directions offer guidance for researchers and practitioners aiming to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence.
- [1096] arXiv:2401.10036 (cross-list from cs.CR) [ pdf , other ]
-
Title: LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber KnowledgeAuthors: Shaswata Mitra , Subash Neupane , Trisha Chakraborty , Sudip Mittal , Aritran Piplai , Manas Gaur , Shahram RahimiSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Logic in Computer Science (cs.LO)
Abstract: Security Operations Center (SoC) analysts gather threat reports from openly accessible global threat databases and customize them manually to suit a particular organization's needs. These analysts also depend on internal repositories, which act as private local knowledge database for an organization. Credible cyber intelligence, critical operational details, and relevant organizational information are all stored in these local knowledge databases. Analysts undertake a labor intensive task utilizing these global and local knowledge databases to manually create organization's unique threat response and mitigation strategies. Recently, Large Language Models (LLMs) have shown the capability to efficiently process large diverse knowledge sources. We leverage this ability to process global and local knowledge databases to automate the generation of organization-specific threat intelligence.
In this work, we present LOCALINTEL, a novel automated knowledge contextualization system that, upon prompting, retrieves threat reports from the global threat repositories and uses its local knowledge database to contextualize them for a specific organization. LOCALINTEL comprises of three key phases: global threat intelligence retrieval, local knowledge retrieval, and contextualized completion generation. The former retrieves intelligence from global threat repositories, while the second retrieves pertinent knowledge from the local knowledge database. Finally, the fusion of these knowledge sources is orchestrated through a generator to produce a contextualized completion. - [1097] arXiv:2401.10040 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large Language Models for Scientific Information Extraction: An Empirical Study for VirologyComments: 8 pages, 6 figures, Accepted as Findings of the ACL: EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Digital Libraries (cs.DL); Information Theory (cs.IT)
Abstract: In this paper, we champion the use of structured and semantic content representation of discourse-based scholarly communication, inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions. These representations provide users with a concise overview, aiding scientists in navigating the dense academic landscape. Our novel automated approach leverages the robust text generation capabilities of LLMs to produce structured scholarly contribution summaries, offering both a practical solution and insights into LLMs' emergent abilities.
For LLMs, the prime focus is on improving their general intelligence as conversational agents. We argue that these models can also be applied effectively in information extraction (IE), specifically in complex IE tasks within terse domains like Science. This paradigm shift replaces the traditional modular, pipelined machine learning approach with a simpler objective expressed through instructions. Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task. - [1098] arXiv:2401.10061 (cross-list from cs.CV) [ pdf , other ]
-
Title: DiffusionGPT: LLM-Driven Text-to-Image Generation SystemAuthors: Jie Qin , Jie Wu , Weifeng Chen , Yuxi Ren , Huixia Li , Hefeng Wu , Xuefeng Xiao , Rui Wang , Shilei WenSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Diffusion models have opened up new avenues for the field of image generation, resulting in the proliferation of high-quality models shared on open-source platforms. However, a major challenge persists in current text-to-image systems are often unable to handle diverse inputs, or are limited to single model results. Current unified attempts often fall into two orthogonal aspects: i) parse Diverse Prompts in input stage; ii) activate expert model to output. To combine the best of both worlds, we propose DiffusionGPT, which leverages Large Language Models (LLM) to offer a unified generation system capable of seamlessly accommodating various types of prompts and integrating domain-expert models. DiffusionGPT constructs domain-specific Trees for various generative models based on prior knowledge. When provided with an input, the LLM parses the prompt and employs the Trees-of-Thought to guide the selection of an appropriate model, thereby relaxing input constraints and ensuring exceptional performance across diverse domains. Moreover, we introduce Advantage Databases, where the Tree-of-Thought is enriched with human feedback, aligning the model selection process with human preferences. Through extensive experiments and comparisons, we demonstrate the effectiveness of DiffusionGPT, showcasing its potential for pushing the boundaries of image synthesis in diverse domains.
- [1099] arXiv:2401.10119 (cross-list from cs.LG) [ pdf , other ]
-
Title: Towards Principled Graph TransformersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings.
- [1100] arXiv:2401.10139 (cross-list from cs.CV) [ pdf , other ]
-
Title: Model Compression Techniques in Biometrics Applications: A SurveyComments: Under review at IEEE JournalSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The development of deep learning algorithms has extensively empowered humanity's task automatization capacity. However, the huge improvement in the performance of these models is highly correlated with their increasing level of complexity, limiting their usefulness in human-oriented applications, which are usually deployed in resource-constrained devices. This led to the development of compression techniques that drastically reduce the computational and memory costs of deep learning models without significant performance degradation. This paper aims to systematize the current literature on this topic by presenting a comprehensive survey of model compression techniques in biometrics applications, namely quantization, knowledge distillation and pruning. We conduct a critical analysis of the comparative value of these techniques, focusing on their advantages and disadvantages and presenting suggestions for future work directions that can potentially improve the current methods. Additionally, we discuss and analyze the link between model bias and model compression, highlighting the need to direct compression research toward model fairness in future works.
- [1101] arXiv:2401.10148 (cross-list from cs.CV) [ pdf , other ]
-
Title: Explicitly Disentangled Representations in Object-Centric LearningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Extracting structured representations from raw visual data is an important and long-standing challenge in machine learning. Recently, techniques for unsupervised learning of object-centric representations have raised growing interest. In this context, enhancing the robustness of the latent features can improve the efficiency and effectiveness of the training of downstream tasks. A promising step in this direction is to disentangle the factors that cause variation in the data. Previously, Invariant Slot Attention disentangled position, scale, and orientation from the remaining features. Extending this approach, we focus on separating the shape and texture components. In particular, we propose a novel architecture that biases object-centric models toward disentangling shape and texture components into two non-overlapping subsets of the latent space dimensions. These subsets are known a priori, hence before the training process. Experiments on a range of object-centric benchmarks reveal that our approach achieves the desired disentanglement while also numerically improving baseline performance in most cases. In addition, we show that our method can generate novel textures for a specific object or transfer textures between objects with distinct shapes.
- [1102] arXiv:2401.10158 (cross-list from cs.NI) [ pdf , other ]
-
Title: DISTINQT: A Distributed Privacy Aware Learning Framework for QoS Prediction for Future Mobile and Wireless NetworksAuthors: Nikolaos Koursioumpas , Lina Magoula , Ioannis Stavrakakis , Nancy Alonistioti , M. A. Gutierrez-Estevez , Ramin KhaliliComments: 11 Pages Double Column, 9 Figures, Submitted for possible publication in the IEEE Transactions on Vehicular Technology (IEEE TVT)Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Abstract: Beyond 5G and 6G networks are expected to support new and challenging use cases and applications that depend on a certain level of Quality of Service (QoS) to operate smoothly. Predicting the QoS in a timely manner is of high importance, especially for safety-critical applications as in the case of vehicular communications. Although until recent years the QoS prediction has been carried out by centralized Artificial Intelligence (AI) solutions, a number of privacy, computational, and operational concerns have emerged. Alternative solutions have been surfaced (e.g. Split Learning, Federated Learning), distributing AI tasks of reduced complexity across nodes, while preserving the privacy of the data. However, new challenges rise when it comes to scalable distributed learning approaches, taking into account the heterogeneous nature of future wireless networks. The current work proposes DISTINQT, a privacy-aware distributed learning framework for QoS prediction. Our framework supports multiple heterogeneous nodes, in terms of data types and model architectures, by sharing computations across them. This, enables the incorporation of diverse knowledge into a sole learning process that will enhance the robustness and generalization capabilities of the final QoS prediction model. DISTINQT also contributes to data privacy preservation by encoding any raw input data into a non-linear latent representation before any transmission. Evaluation results showcase that our framework achieves a statistically identical performance compared to its centralized version and an average performance improvement of up to 65% against six state-of-the-art centralized baseline solutions in the Tele-Operated Driving use case.
- [1103] arXiv:2401.10178 (cross-list from cs.CV) [ pdf , other ]
-
Title: Neural Echos: Depthwise Convolutional Filters Replicate Biological Receptive FieldsJournal-ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (2024) 8216-8225Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: In this study, we present evidence suggesting that depthwise convolutional kernels are effectively replicating the structural intricacies of the biological receptive fields observed in the mammalian retina. We provide analytics of trained kernels from various state-of-the-art models substantiating this evidence. Inspired by this intriguing discovery, we propose an initialization scheme that draws inspiration from the biological receptive fields. Experimental analysis of the ImageNet dataset with multiple CNN architectures featuring depthwise convolutions reveals a marked enhancement in the accuracy of the learned model when initialized with biologically derived weights. This underlies the potential for biologically inspired computational models to further our understanding of vision processing systems and to improve the efficacy of convolutional networks.
- [1104] arXiv:2401.10189 (cross-list from cs.CL) [ pdf , other ]
-
Title: Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text ReconstructionComments: 16 pages. Accepted by Findings of the Association for Computational Linguistics: EACL 2024. Code and resources are available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq) based few-shot entity extraction approach, to address these two challenges. Our Chem-FINESE has two components: a seq2seq entity extractor to extract named entities from the input sentence and a seq2seq self-validation module to reconstruct the original input sentence from extracted entities. Inspired by the fact that a good entity extraction system needs to extract entities faithfully, our new self-validation module leverages entity extraction results to reconstruct the original input sentence. Besides, we design a new contrastive loss to reduce excessive copying during the extraction process. Finally, we release ChemNER+, a new fine-grained chemical entity extraction dataset that is annotated by domain experts with the ChemNER schema. Experiments in few-shot settings with both ChemNER+ and CHEMET datasets show that our newly proposed framework has contributed up to 8.26% and 6.84% absolute F1-score gains respectively.
- [1105] arXiv:2401.10207 (cross-list from cs.CR) [ pdf , other ]
-
Title: Eclectic Rule Extraction for Explainability of Deep Neural Network based Intrusion Detection SystemsAuthors: Jesse Ables , Nathaniel Childers , William Anderson , Sudip Mittal , Shahram Rahimi , Ioana Banicescu , Maria SealeSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper addresses trust issues created from the ubiquity of black box algorithms and surrogate explainers in Explainable Intrusion Detection Systems (X-IDS). While Explainable Artificial Intelligence (XAI) aims to enhance transparency, black box surrogate explainers, such as Local Interpretable Model-Agnostic Explanation (LIME) and SHapley Additive exPlanation (SHAP), are difficult to trust. The black box nature of these surrogate explainers makes the process behind explanation generation opaque and difficult to understand. To avoid this problem, one can use transparent white box algorithms such as Rule Extraction (RE). There are three types of RE algorithms: pedagogical, decompositional, and eclectic. Pedagogical methods offer fast but untrustworthy white-box explanations, while decompositional RE provides trustworthy explanations with poor scalability. This work explores eclectic rule extraction, which strikes a balance between scalability and trustworthiness. By combining techniques from pedagogical and decompositional approaches, eclectic rule extraction leverages the advantages of both, while mitigating some of their drawbacks. The proposed Hybrid X-IDS architecture features eclectic RE as a white box surrogate explainer for black box Deep Neural Networks (DNN). The presented eclectic RE algorithm extracts human-readable rules from hidden layers, facilitating explainable and trustworthy rulesets. Evaluations on UNSW-NB15 and CIC-IDS-2017 datasets demonstrate the algorithm's ability to generate rulesets with 99.9% accuracy, mimicking DNN outputs. The contributions of this work include the hybrid X-IDS architecture, the eclectic rule extraction algorithm applicable to intrusion detection datasets, and a thorough analysis of performance and explainability, demonstrating the trade-offs involved in rule extraction speed and accuracy.
- [1106] arXiv:2401.10210 (cross-list from cs.CY) [ pdf , other ]
-
Title: Mastery Guided Non-parametric Clustering to Scale-up Strategy PredictionComments: Proceedings of 37th AAAI Conference on Artificial Intelligence Artificial Intelligence for Education. arXiv admin note: substantial text overlap with arXiv:2308.03892Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Predicting the strategy (sequence of concepts) that a student is likely to use in problem-solving helps Adaptive Instructional Systems (AISs) better adapt themselves to different types of learners based on their learning abilities. This can lead to a more dynamic, engaging, and personalized experience for students. To scale up training a prediction model (such as LSTMs) over large-scale education datasets, we develop a non-parametric approach to cluster symmetric instances in the data. Specifically, we learn a representation based on Node2Vec that encodes symmetries over mastery or skill level since, to solve a problem, it is natural that a student's strategy is likely to involve concepts in which they have gained mastery. Using this representation, we use DP-Means to group symmetric instances through a coarse-to-fine refinement of the clusters. We apply our model to learn strategies for Math learning from large-scale datasets from MATHia, a leading AIS for middle-school math learning. Our results illustrate that our approach can consistently achieve high accuracy using a small sample that is representative of the full dataset. Further, we show that this approach helps us learn strategies with high accuracy for students at different skill levels, i.e., leveraging symmetries improves fairness in the prediction model.
- [1107] arXiv:2401.10222 (cross-list from cs.CV) [ pdf , other ]
-
Title: Supervised Fine-tuning in turn Improves Visual Foundation ModelsComments: 23 pages, 3 figures, Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Image-text training like CLIP has dominated the pretraining of vision foundation models in recent years. Subsequent efforts have been made to introduce region-level visual learning into CLIP's pretraining but face scalability challenges due to the lack of large-scale region-level datasets. Drawing inspiration from supervised fine-tuning (SFT) in natural language processing such as instruction tuning, we explore the potential of fine-grained SFT in enhancing the generation of vision foundation models after their pretraining. Thus a two-stage method ViSFT (Vision SFT) is proposed to unleash the fine-grained knowledge of vision foundation models. In ViSFT, the vision foundation model is enhanced by performing visual joint learning on some in-domain tasks and then tested on out-of-domain benchmarks. With updating using ViSFT on 8 V100 GPUs in less than 2 days, a vision transformer with over 4.4B parameters shows improvements across various out-of-domain benchmarks including vision and vision-linguistic scenarios.
- [1108] arXiv:2401.10225 (cross-list from cs.CL) [ pdf , other ]
-
Title: ChatQA: Building GPT-4 Level Conversational QA ModelsAuthors: Zihan Liu , Wei Ping , Rajarshi Roy , Peng Xu , Chankyu Lee , Mohammad Shoeybi , Bryan CatanzaroComments: We added ChatQA-22B resultsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: In this work, we introduce ChatQA, a family of conversational question answering (QA) models that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval-augmented generation in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.
- [1109] arXiv:2401.10241 (cross-list from cs.DC) [ pdf , other ]
-
Title: Zero Bubble Pipeline ParallelismSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Pipeline parallelism is one of the key components for large-scale distributed training, yet its efficiency suffers from pipeline bubbles which were deemed inevitable. In this work, we introduce a scheduling strategy that, to our knowledge, is the first to successfully achieve zero pipeline bubbles under synchronous training semantics. The key idea behind this improvement is to split the backward computation into two parts, one that computes gradient for the input and another that computes for the parameters. Based on this idea, we handcraft novel pipeline schedules that significantly outperform the baseline methods. We further develop an algorithm that automatically finds an optimal schedule based on specific model configuration and memory limit. Additionally, to truly achieve zero bubble, we introduce a novel technique to bypass synchronizations during the optimizer step. Experimental evaluations show that our method outperforms the 1F1B schedule up to 23% in throughput under a similar memory limit. This number can be further pushed to 31% when the memory constraint is relaxed. We believe our results mark a major step forward in harnessing the true potential of pipeline parallelism. We open sourced our implementation based on the popular Megatron-LM repository on this https URL .
- [1110] arXiv:2401.10262 (cross-list from cs.CV) [ pdf , other ]
-
Title: Null Space Properties of Neural Networks with Applications to Image SteganographySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: This paper explores the null space properties of neural networks. We extend the null space definition from linear to nonlinear maps and discuss the presence of a null space in neural networks. The null space of a given neural network can tell us the part of the input data that makes no contribution to the final prediction so that we can use it to trick the neural network. This reveals an inherent weakness in neural networks that can be exploited. One application described here leads to a method of image steganography. Through experiments on image datasets such as MNIST, we show that we can use null space components to force the neural network to choose a selected hidden image class, even though the overall image can be made to look like a completely different image. We conclude by showing comparisons between what a human viewer would see, and the part of the image that the neural network is actually using to make predictions and, hence, show that what the neural network ``sees'' is completely different than what we would expect.
- [1111] arXiv:2401.10264 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Harnessing Transparent Learning Analytics for Individualized Support through Auto-detection of Engagement in Face-to-Face Collaborative LearningComments: 12 pages, 5 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Using learning analytics to investigate and support collaborative learning has been explored for many years. Recently, automated approaches with various artificial intelligence approaches have provided promising results for modelling and predicting student engagement and performance in collaborative learning tasks. However, due to the lack of transparency and interpretability caused by the use of "black box" approaches in learning analytics design and implementation, guidance for teaching and learning practice may become a challenge. On the one hand, the black box created by machine learning algorithms and models prevents users from obtaining educationally meaningful learning and teaching suggestions. On the other hand, focusing on group and cohort level analysis only can make it difficult to provide specific support for individual students working in collaborative groups. This paper proposes a transparent approach to automatically detect student's individual engagement in the process of collaboration. The results show that the proposed approach can reflect student's individual engagement and can be used as an indicator to distinguish students with different collaborative learning challenges (cognitive, behavioural and emotional) and learning outcomes. The potential of the proposed collaboration analytics approach for scaffolding collaborative learning practice in face-to-face contexts is discussed and future research suggestions are provided.
- [1112] arXiv:2401.10266 (cross-list from cs.LG) [ pdf , other ]
-
Title: Intelligent Condition Monitoring of Industrial Plants: An Overview of Methodologies and Uncertainty Management StrategiesAuthors: Maryam Ahang , Todd Charter , Oluwaseyi Ogunfowora , Maziyar Khadivi , Mostafa Abbasi , Homayoun NajjaranSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Systems and Control (eess.SY)
Abstract: Condition monitoring plays a significant role in the safety and reliability of modern industrial systems. Artificial intelligence (AI) approaches are gaining attention from academia and industry as a growing subject in industrial applications and as a powerful way of identifying faults. This paper provides an overview of intelligent condition monitoring and fault detection and diagnosis methods for industrial plants with a focus on the open-source benchmark Tennessee Eastman Process (TEP). In this survey, the most popular and state-of-the-art deep learning (DL) and machine learning (ML) algorithms for industrial plant condition monitoring, fault detection, and diagnosis are summarized and the advantages and disadvantages of each algorithm are studied. Challenges like imbalanced data, unlabelled samples and how deep learning models can handle them are also covered. Finally, a comparison of the accuracies and specifications of different algorithms utilizing the Tennessee Eastman Process (TEP) is conducted. This research will be beneficial for both researchers who are new to the field and experts, as it covers the literature on condition monitoring and state-of-the-art methods alongside the challenges and possible solutions to them.
- [1113] arXiv:2401.10267 (cross-list from cs.AR) [ pdf , other ]
-
Title: HyperSense: Accelerating Hyper-Dimensional Computing for Intelligent Sensor Data ProcessingAuthors: Sanggeon Yun , Hanning Chen , Ryozo Masukawa , Hamza Errahmouni Barkam , Andrew Ding , Wenjun Huang , Arghavan Rezvani , Shaahin Angizi , Mohsen ImaniSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: Introducing HyperSense, our co-designed hardware and software system efficiently controls Analog-to-Digital Converter (ADC) modules' data generation rate based on object presence predictions in sensor data. Addressing challenges posed by escalating sensor quantities and data rates, HyperSense reduces redundant digital data using energy-efficient low-precision ADC, diminishing machine learning system costs. Leveraging neurally-inspired HyperDimensional Computing (HDC), HyperSense analyzes real-time raw low-precision sensor data, offering advantages in handling noise, memory-centricity, and real-time learning.
Our proposed HyperSense model combines high-performance software for object detection with real-time hardware prediction, introducing the novel concept of Intelligent Sensor Control. Comprehensive software and hardware evaluations demonstrate our solution's superior performance, evidenced by the highest Area Under the Curve (AUC) and sharpest Receiver Operating Characteristic (ROC) curve among lightweight models. Hardware-wise, our FPGA-based domain-specific accelerator tailored for HyperSense achieves a 5.6x speedup compared to YOLOv4 on NVIDIA Jetson Orin while showing up to 92.1% energy saving compared to the conventional system. These results underscore HyperSense's effectiveness and efficiency, positioning it as a promising solution for intelligent sensing and real-time data processing across diverse applications. - [1114] arXiv:2401.10268 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: The complementary contributions of academia and industry to AI researchAuthors: Lizhen Liang (Syracuse University), Han Zhuang (Northeastern University), James Zou (Stanford University), Daniel E. Acuna (University of Colorado at Boulder)Comments: 28 pages, 7 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Artificial intelligence (AI) has seen tremendous development in industry and academia. However, striking recent advances by industry have stunned the world, inviting a fresh perspective on the role of academic research in this field. Here, we characterize the impact and type of AI produced by both environments over the last 25 years and establish several patterns. We find that articles published by teams consisting exclusively of industry researchers tend to get greater attention, with a higher chance of being highly cited and citation-disruptive, and several times more likely to produce state-of-the-art models. In contrast, we find that exclusively academic teams publish the bulk of AI research and tend to produce higher novelty work, with single papers having several times higher likelihood of being unconventional and atypical. The respective impact-novelty advantages of industry and academia are robust to controls for subfield, team size, seniority, and prestige. We find that academic-industry collaborations struggle to replicate the novelty of academic teams and tend to look similar to industry teams. Together, our findings identify the unique and nearly irreplaceable contributions that both academia and industry make toward the healthy progress of AI.
- [1115] arXiv:2401.10271 (cross-list from cs.DB) [ pdf , other ]
-
Title: Querying Triadic Concepts through Partial or Complete Matching of TriplesSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we introduce a new method for querying triadic concepts through partial or complete matching of triples using an inverted index, to retrieve already computed triadic concepts that contain a set of terms in their extent, intent, and/or modus. As opposed to the approximation approach described in Ananias, this method (i) does not need to keep the initial triadic context or its three dyadic counterparts, (ii) avoids the application of derivation operators on the triple components through context exploration, and (iii) eliminates the requirement for a factorization phase to get triadic concepts as the answer to one-dimensional queries. Additionally, our solution introduces a novel metric for ranking the retrieved triadic concepts based on their similarity to a given query. Lastly, an empirical study is primarily done to illustrate the effectiveness and scalability of our approach against the approximation one. Our solution not only showcases superior efficiency, but also highlights a better scalability, making it suitable for big data scenarios.
- [1116] arXiv:2401.10272 (cross-list from cs.CV) [ pdf , other ]
-
Title: Multi-Source Collaborative Gradient Discrepancy Minimization for Federated Domain GeneralizationComments: Accepted by AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Federated Domain Generalization aims to learn a domain-invariant model from multiple decentralized source domains for deployment on unseen target domain. Due to privacy concerns, the data from different source domains are kept isolated, which poses challenges in bridging the domain gap. To address this issue, we propose a Multi-source Collaborative Gradient Discrepancy Minimization (MCGDM) method for federated domain generalization. Specifically, we propose intra-domain gradient matching between the original images and augmented images to avoid overfitting the domain-specific information within isolated domains. Additionally, we propose inter-domain gradient matching with the collaboration of other domains, which can further reduce the domain shift across decentralized domains. Combining intra-domain and inter-domain gradient matching, our method enables the learned model to generalize well on unseen domains. Furthermore, our method can be extended to the federated domain adaptation task by fine-tuning the target model on the pseudo-labeled target domain. The extensive experiments on federated domain generalization and adaptation indicate that our method outperforms the state-of-the-art methods significantly.
- [1117] arXiv:2401.10273 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Revolutionizing Pharma: Unveiling the AI and LLM Trends in the Pharmaceutical IndustrySubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: This document offers a critical overview of the emerging trends and significant advancements in artificial intelligence (AI) within the pharmaceutical industry. Detailing its application across key operational areas, including research and development, animal testing, clinical trials, hospital clinical stages, production, regulatory affairs, quality control and other supporting areas, the paper categorically examines AI's role in each sector. Special emphasis is placed on cutting-edge AI technologies like machine learning algorithms and their contributions to various aspects of pharmaceutical operations. Through this comprehensive analysis, the paper highlights the transformative potential of AI in reshaping the pharmaceutical industry's future.
- [1118] arXiv:2401.10274 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Knowledge-Assisted Dual-Stage Evolutionary Optimization of Large-Scale Crude Oil SchedulingSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: With the scaling up of crude oil scheduling in modern refineries, large-scale crude oil scheduling problems (LSCOSPs) emerge with thousands of binary variables and non-linear constraints, which are challenging to be optimized by traditional optimization methods. To solve LSCOSPs, we take the practical crude oil scheduling from a marine-access refinery as an example and start with modeling LSCOSPs from crude unloading, transportation, crude distillation unit processing, and inventory management of intermediate products. On the basis of the proposed model, a dual-stage evolutionary algorithm driven by heuristic rules (denoted by DSEA/HR) is developed, where the dual-stage search mechanism consists of global search and local refinement. In the global search stage, we devise several heuristic rules based on the empirical operating knowledge to generate a well-performing initial population and accelerate convergence in the mixed variables space. In the local refinement stage, a repair strategy is proposed to move the infeasible solutions towards feasible regions by further optimizing the local continuous variables. During the whole evolutionary process, the proposed dual-stage framework plays a crucial role in balancing exploration and exploitation. Experimental results have shown that DSEA/HR outperforms the state-of-the-art and widely-used mathematical programming methods and metaheuristic algorithms on LSCOSP instances within a reasonable time.
- [1119] arXiv:2401.10279 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: A systematic review of geospatial location embedding approaches in large language models: A path to spatial AI systemsAuthors: Sean TuckerComments: 20 pages, 11 figures, 3 appendicesSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Geospatial Location Embedding (GLE) helps a Large Language Model (LLM) assimilate and analyze spatial data. GLE emergence in Geospatial Artificial Intelligence (GeoAI) is precipitated by the need for deeper geospatial awareness in our complex contemporary spaces and the success of LLMs in extracting deep meaning in Generative AI. We searched Google Scholar, Science Direct, and arXiv for papers on geospatial location embedding and LLM and reviewed articles focused on gaining deeper spatial "knowing" through LLMs. We screened 304 titles, 30 abstracts, and 18 full-text papers that reveal four GLE themes - Entity Location Embedding (ELE), Document Location Embedding (DLE), Sequence Location Embedding (SLE), and Token Location Embedding (TLE). Synthesis is tabular and narrative, including a dialogic conversation between "Space" and "LLM." Though GLEs aid spatial understanding by superimposing spatial data, they emphasize the need to advance in the intricacies of spatial modalities and generalized reasoning. GLEs signal the need for a Spatial Foundation/Language Model (SLM) that embeds spatial knowing within the model architecture. The SLM framework advances Spatial Artificial Intelligence Systems (SPAIS), establishing a Spatial Vector Space (SVS) that maps to physical space. The resulting spatially imbued Language Model is unique. It simultaneously represents actual space and an AI-capable space, paving the way for AI native geo storage, analysis, and multi-modality as the basis for Spatial Artificial Intelligence Systems (SPAIS).
- [1120] arXiv:2401.10286 (cross-list from cs.CL) [ pdf , other ]
-
Title: Code-Based English Models Surprising Performance on Chinese QA Pair Extraction TaskAuthors: Linghan Zheng , Hui Liu , Xiaojun Lin , Jiayuan Dong , Yue Sheng , Gang Shi , Zhiwei Liu , Hongwei ChenSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In previous studies, code-based models have consistently outperformed text-based models in reasoning-intensive scenarios. When generating our knowledge base for Retrieval-Augmented Generation (RAG), we observed that code-based models also perform exceptionally well in Chinese QA Pair Extraction task. Further, our experiments and the metrics we designed discovered that code-based models containing a certain amount of Chinese data achieve even better performance. Additionally, the capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.
- [1121] arXiv:2401.10289 (cross-list from cs.ET) [ pdf , ps , other ]
-
Title: Design and development of opto-neural processors for simulation of neural networks trained in image detection for potential implementation in hybrid roboticsAuthors: Sanjana ShettySubjects: Emerging Technologies (cs.ET) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Neural networks have been employed for a wide range of processing applications like image processing, motor control, object detection and many others. Living neural networks offer advantages of lower power consumption, faster processing, and biological realism. Optogenetics offers high spatial and temporal control over biological neurons and presents potential in training live neural networks. This work proposes a simulated living neural network trained indirectly by backpropagating STDP based algorithms using precision activation by optogenetics achieving accuracy comparable to traditional neural network training algorithms.
- [1122] arXiv:2401.10299 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An attempt to generate new bridge types from latent space of generative flowAuthors: Hongjun ZhangComments: 10 pages, 11 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Through examples of coordinate and probability transformation between different distributions, the basic principle of normalizing flow is introduced in a simple and concise manner. From the perspective of the distribution of random variable function, the essence of probability transformation is explained, and the scaling factor Jacobian determinant of probability transformation is introduced. Treating the dataset as a sample from the population, obtaining normalizing flow is essentially through sampling surveys to statistically infer the numerical features of the population, and then the loss function is established by using the maximum likelihood estimation method. This article introduces how normalizing flow cleverly solves the two major application challenges of high-dimensional matrix determinant calculation and neural network reversible transformation. Using symmetric structured image dataset of three-span beam bridge, arch bridge, cable-stayed bridge and suspension bridge, constructing and training normalizing flow based on the Glow API in the TensorFlow Probability library. The model can smoothly transform the complex distribution of the bridge dataset into a standard normal distribution, and from the obtained latent space sampling, it can generate new bridge types that are different from the training dataset.
- [1123] arXiv:2401.10300 (cross-list from cs.MA) [ pdf , other ]
-
Title: A Hierarchical Framework with Spatio-Temporal Consistency Learning for Emergence Detection in Complex Adaptive SystemsComments: 18 pages, under reviewSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Emergence, a global property of complex adaptive systems (CASs) constituted by interactive agents, is prevalent in real-world dynamic systems, e.g., network-level traffic congestions. Detecting its formation and evaporation helps to monitor the state of a system, allowing to issue a warning signal for harmful emergent phenomena. Since there is no centralized controller of CAS, detecting emergence based on each agent's local observation is desirable but challenging. Existing works are unable to capture emergence-related spatial patterns, and fail to model the nonlinear relationships among agents. This paper proposes a hierarchical framework with spatio-temporal consistency learning to solve these two problems by learning the system representation and agent representations, respectively. Especially, spatio-temporal encoders are tailored to capture agents' nonlinear relationships and the system's complex evolution. Representations of the agents and the system are learned by preserving the intrinsic spatio-temporal consistency in a self-supervised manner. Our method achieves more accurate detection than traditional methods and deep learning methods on three datasets with well-known yet hard-to-detect emergent behaviors. Notably, our hierarchical framework is generic, which can employ other deep learning methods for agent-level and system-level detection.
- [1124] arXiv:2401.10304 (cross-list from cs.LG) [ pdf , other ]
-
Title: On the Readiness of Scientific Data for a Fair and Transparent Use in Machine LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Digital Libraries (cs.DL)
Abstract: To ensure the fairness and trustworthiness of machine learning (ML) systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, the adoption of these practices by academic institutions has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their completeness and coverage of the requested dimensions, and trends in recent years, putting special emphasis on the most and least documented dimensions. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data's preparedness for its transparent and fairer use in ML technologies.
- [1125] arXiv:2401.10310 (cross-list from cs.LG) [ pdf , other ]
-
Title: Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency RequirementSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Complexity (cs.CC)
Abstract: Deep learning still has drawbacks in terms of trustworthiness, which describes a comprehensible, fair, safe, and reliable method. To mitigate the potential risk of AI, clear obligations associated to trustworthiness have been proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a central question is to what extent trustworthy deep learning can be realized. Establishing the described properties constituting trustworthiness requires that the factors influencing an algorithmic computation can be retraced, i.e., the algorithmic implementation is transparent. Motivated by the observation that the current evolution of deep learning models necessitates a change in computing technology, we derive a mathematical framework which enables us to analyze whether a transparent implementation in a computing model is feasible. We exemplarily apply our trustworthiness framework to analyze deep learning approaches for inverse problems in digital and analog computing models represented by Turing and Blum-Shub-Smale Machines, respectively. Based on previous results, we find that Blum-Shub-Smale Machines have the potential to establish trustworthy solvers for inverse problems under fairly general conditions, whereas Turing machines cannot guarantee trustworthiness to the same degree.
- [1126] arXiv:2401.10314 (cross-list from cs.SE) [ pdf , other ]
-
Title: LangProp: A code optimization framework using Language Models applied to drivingAuthors: Shu Ishida , Gianluca Corrado , George Fedoseev , Hudson Yeo , Lloyd Russell , Jamie Shotton , João F. Henriques , Anthony HuSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: LangProp is a framework for iteratively optimizing code generated by large language models (LLMs) in a supervised/reinforcement learning setting. While LLMs can generate sensible solutions zero-shot, the solutions are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code performance on a dataset of input-output pairs, as well as catches any exceptions, and feeds the results back to the LLM in the training loop, so that the LLM can iteratively improve the code it generates. By adopting a metric- and data-driven training paradigm for this code optimization procedure, one could easily adapt findings from traditional machine learning techniques such as imitation learning, DAgger, and reinforcement learning. We demonstrate the first proof of concept of automated code optimization for autonomous driving in CARLA, showing that LangProp can generate interpretable and transparent driving policies that can be verified and improved in a metric- and data-driven way. Our code will be open-sourced and is available at this https URL .
- [1127] arXiv:2401.10316 (cross-list from cs.IR) [ pdf , other ]
-
Title: Improving One-class Recommendation with Multi-tasking on Various Preference IntensitiesComments: RecSys 2020 (ACM Conference on Recommender Systems 2020)Journal-ref: RecSys 2020: Proceedings of the 14th ACM Conference on Recommender Systems, Pages 498 to 502Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the one-class recommendation problem, it's required to make recommendations basing on users' implicit feedback, which is inferred from their action and inaction. Existing works obtain representations of users and items by encoding positive and negative interactions observed from training data. However, these efforts assume that all positive signals from implicit feedback reflect a fixed preference intensity, which is not realistic. Consequently, representations learned with these methods usually fail to capture informative entity features that reflect various preference intensities.
In this paper, we propose a multi-tasking framework taking various preference intensities of each signal from implicit feedback into consideration. Representations of entities are required to satisfy the objective of each subtask simultaneously, making them more robust and generalizable. Furthermore, we incorporate attentive graph convolutional layers to explore high-order relationships in the user-item bipartite graph and dynamically capture the latent tendencies of users toward the items they interact with. Experimental results show that our method performs better than state-of-the-art methods by a large margin on three large-scale real-world benchmark datasets. - [1128] arXiv:2401.10337 (cross-list from cs.LG) [ pdf , other ]
-
Title: Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern RecognitionComments: accepted at EACL 2024, in ARR October 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract: Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.
- [1129] arXiv:2401.10341 (cross-list from cs.CV) [ pdf , other ]
-
Title: ELRT: Efficient Low-Rank Training for Compact Convolutional Neural NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT.
- [1130] arXiv:2401.10359 (cross-list from cs.SE) [ pdf , other ]
-
Title: Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate OverfittingSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: In software engineering, deep learning models are increasingly deployed for critical tasks such as bug detection and code review. However, overfitting remains a challenge that affects the quality, reliability, and trustworthiness of software systems that utilize deep learning models. Overfitting can be (1) prevented (e.g., using dropout or early stopping) or (2) detected in a trained model (e.g., using correlation-based approaches). Both overfitting detection and prevention approaches that are currently used have constraints (e.g., requiring modification of the model structure, and high computing resources). In this paper, we propose a simple, yet powerful approach that can both detect and prevent overfitting based on the training history (i.e., validation losses). Our approach first trains a time series classifier on training histories of overfit models. This classifier is then used to detect if a trained model is overfit. In addition, our trained classifier can be used to prevent overfitting by identifying the optimal point to stop a model's training. We evaluate our approach on its ability to identify and prevent overfitting in real-world samples. We compare our approach against correlation-based detection approaches and the most commonly used prevention approach (i.e., early stopping). Our approach achieves an F1 score of 0.91 which is at least 5% higher than the current best-performing non-intrusive overfitting detection approach. Furthermore, our approach can stop training to avoid overfitting at least 32% of the times earlier than early stopping and has the same or a better rate of returning the best model.
- [1131] arXiv:2401.10361 (cross-list from cs.LG) [ pdf , other ]
-
Title: Hierarchical Federated Learning in Multi-hop Cluster-Based VANETsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
Abstract: The usage of federated learning (FL) in Vehicular Ad hoc Networks (VANET) has garnered significant interest in research due to the advantages of reducing transmission overhead and protecting user privacy by communicating local dataset gradients instead of raw data. However, implementing FL in VANETs faces challenges, including limited communication resources, high vehicle mobility, and the statistical diversity of data distributions. In order to tackle these issues, this paper introduces a novel framework for hierarchical federated learning (HFL) over multi-hop clustering-based VANET. The proposed method utilizes a weighted combination of the average relative speed and cosine similarity of FL model parameters as a clustering metric to consider both data diversity and high vehicle mobility. This metric ensures convergence with minimum changes in cluster heads while tackling the complexities associated with non-independent and identically distributed (non-IID) data scenarios. Additionally, the framework includes a novel mechanism to manage seamless transitions of cluster heads (CHs), followed by transferring the most recent FL model parameter to the designated CH. Furthermore, the proposed approach considers the option of merging CHs, aiming to reduce their count and, consequently, mitigate associated overhead. Through extensive simulations, the proposed hierarchical federated learning over clustered VANET has been demonstrated to improve accuracy and convergence time significantly while maintaining an acceptable level of packet overhead compared to previously proposed clustering algorithms and non-clustered VANET.
- [1132] arXiv:2401.10372 (cross-list from cs.SE) [ pdf , other ]
-
Title: MutaBot: A Mutation Testing Approach for ChatbotsComments: 5 pages, 2 figures, 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion '24)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Mutation testing is a technique aimed at assessing the effectiveness of test suites by seeding artificial faults into programs. Although available for many platforms and languages, no mutation testing tool is currently available for conversational chatbots, which represent an increasingly popular solution to design systems that can interact with users through a natural language interface. Note that since conversations must be explicitly engineered by the developers of conversational chatbots, these systems are exposed to specific types of faults not supported by existing mutation testing tools.
In this paper, we present MutaBot, a mutation testing tool for conversational chatbots. MutaBot addresses mutations at multiple levels, including conversational flows, intents, and contexts. We designed the tool to potentially target multiple platforms, while we implemented initial support for Google Dialogflow chatbots. We assessed the tool with three Dialogflow chatbots and test cases generated with Botium, revealing weaknesses in the test suites. - [1133] arXiv:2401.10379 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Agricultural Object Detection with You Look Only Once (YOLO) Algorithm: A Bibliometric and Systematic Literature ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Vision is a major component in several digital technologies and tools used in agriculture. The object detector, You Look Only Once (YOLO), has gained popularity in agriculture in a relatively short span due to its state-of-the-art performance. YOLO offers real-time detection with good accuracy and is implemented in various agricultural tasks, including monitoring, surveillance, sensing, automation, and robotics. The research and application of YOLO in agriculture are accelerating rapidly but are fragmented and multidisciplinary. Moreover, the performance characteristics (i.e., accuracy, speed, computation) of the object detector influence the rate of technology implementation and adoption in agriculture. Thus, the study aims to collect extensive literature to document and critically evaluate the advances and application of YOLO for agricultural object recognition. First, we conducted a bibliometric review of 257 articles to understand the scholarly landscape of YOLO in agricultural domain. Secondly, we conducted a systematic review of 30 articles to identify current knowledge, gaps, and modifications in YOLO for specific agricultural tasks. The study critically assesses and summarizes the information on YOLO's end-to-end learning approach, including data acquisition, processing, network modification, integration, and deployment. We also discussed task-specific YOLO algorithm modification and integration to meet the agricultural object or environment-specific challenges. In general, YOLO-integrated digital tools and technologies show the potential for real-time, automated monitoring, surveillance, and object handling to reduce labor, production cost, and environmental impact while maximizing resource efficiency. The study provides detailed documentation and significantly advances the existing knowledge on applying YOLO in agriculture, which can greatly benefit the scientific community.
- [1134] arXiv:2401.10393 (cross-list from cs.LG) [ pdf , other ]
-
Title: Catastrophic Interference is Mitigated in Naturalistic Power-Law Learning EnvironmentsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Neural networks often suffer from catastrophic interference (CI): performance on previously learned tasks drops off significantly when learning a new task. This contrasts strongly with humans, who can sequentially learn new tasks without appreciably forgetting previous tasks. Prior work has explored various techniques for mitigating CI such as regularization, rehearsal, generative replay, and distillation methods. The current work takes a different approach, one guided by cognitive science research showing that in naturalistic environments, the probability of encountering a task decreases as a power-law of the time since it was last performed. We argue that a realistic evaluation of techniques for the mitigation of CI should be performed in simulated naturalistic learning environments. Thus, we evaluate the extent of mitigation of CI when training simple rehearsal-based methods in power-law environments similar to the ones humans face. Our work explores this novel rehearsal-based approach for a domain-incremental task: learning permutations in the MNIST task. We compare our rehearsal environment with other baselines to show its efficacy in promoting continual learning. Additionally, we investigate whether this environment shows forward facilitation, i.e., faster learning of later tasks. Next, we explore the robustness of our learning environment to the number of tasks, model size, and amount of data rehearsed after each task. Notably, our results show that the performance is comparable or superior to that of models trained using popular regularization methods and also to rehearsals in non-power-law environments. The benefits of this training paradigm include simplicity and the lack of a need for extra neural circuitry. In addition, because our method is orthogonal to other methods, future research can combine training in power-law environments with other continual learning mechanisms.
- [1135] arXiv:2401.10394 (cross-list from cs.LG) [ pdf , other ]
-
Title: Distribution Consistency based Self-Training for Graph Neural Networks with Sparse LabelsComments: Accepted by WSDM 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Few-shot node classification poses a significant challenge for Graph Neural Networks (GNNs) due to insufficient supervision and potential distribution shifts between labeled and unlabeled nodes. Self-training has emerged as a widely popular framework to leverage the abundance of unlabeled data, which expands the training set by assigning pseudo-labels to selected unlabeled nodes. Efforts have been made to develop various selection strategies based on confidence, information gain, etc. However, none of these methods takes into account the distribution shift between the training and testing node sets. The pseudo-labeling step may amplify this shift and even introduce new ones, hindering the effectiveness of self-training. Therefore, in this work, we explore the potential of explicitly bridging the distribution shift between the expanded training set and test set during self-training. To this end, we propose a novel Distribution-Consistent Graph Self-Training (DC-GST) framework to identify pseudo-labeled nodes that are both informative and capable of redeeming the distribution discrepancy and formulate it as a differentiable optimization task. A distribution-shift-aware edge predictor is further adopted to augment the graph and increase the model's generalizability in assigning pseudo labels. We evaluate our proposed method on four publicly available benchmark datasets and extensive experiments demonstrate that our framework consistently outperforms state-of-the-art baselines.
- [1136] arXiv:2401.10415 (cross-list from cs.CL) [ pdf , other ]
-
Title: Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this work, we investigate the controllability of large language models (LLMs) on scientific summarization tasks. We identify key stylistic and content coverage factors that characterize different types of summaries such as paper reviews, abstracts, and lay summaries. By controlling stylistic features, we find that non-fine-tuned LLMs outperform humans in the MuP review generation task, both in terms of similarity to reference summaries and human preferences. Also, we show that we can improve the controllability of LLMs with keyword-based classifier-free guidance (CFG) while achieving lexical overlap comparable to strong fine-tuned baselines on arXiv and PubMed. However, our results also indicate that LLMs cannot consistently generate long summaries with more than 8 sentences. Furthermore, these models exhibit limited capacity to produce highly abstractive lay summaries. Although LLMs demonstrate strong generic summarization competency, sophisticated content control without costly fine-tuning remains an open problem for domain-specific applications.
- [1137] arXiv:2401.10446 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large Language Models are Efficient Learners of Noise-Robust Speech RecognitionAuthors: Yuchen Hu , Chen Chen , Chao-Han Huck Yang , Ruizhe Li , Chao Zhang , Pin-Yu Chen , EnSiong ChngComments: Accepted to ICLR 2024, Spotlight top 5%, 24 pages. This work will be open sourced at: this https URL under MIT licenseSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with HyPoradise dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by efficient LLM finetuning, which shows great effectiveness but lacks specificity on noise-robust ASR. In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER just like what robust ASR do}, where one solution is introducing noise information as a conditioner into LLM. However, directly incorporating noise embeddings from audio encoder could harm the LLM tuning due to cross-modality gap. To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER. Furthermore, in order to enhance its representation ability of audio noise, we design a knowledge distillation (KD) approach via mutual information estimation to distill the real noise information in audio embeddings to our language embedding. Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate while with limited training data. Analysis shows that our language-space noise embedding can well represent the noise conditions of source speech, under which off-the-shelf LLMs show strong ability of language-space denoising.
- [1138] arXiv:2401.10447 (cross-list from cs.CL) [ pdf , other ]
-
Title: Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech RecognitionAuthors: Yu Yu , Chao-Han Huck Yang , Tuan Dinh , Sungho Ryu , Jari Kolehmainen , Roger Ren , Denis Filimonov , Prashanth G. Shivakumar , Ankur Gandhe , Ariya Rastow , Jia Xu , Ivan Bulyko , Andreas StolckeSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dataset and of 3.67\% on an internal dataset in the messaging domain. To further characterize the stability of LoRA-based second-pass speech recognition models, we examine robustness against input perturbations. These perturbations are rooted in homophone replacements and a novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both designed to measure the relative degradation in the performance of rescoring models. Our experimental results indicate that while advanced variants of LoRA, such as dynamic rank-allocated LoRA, lead to performance degradation in $1$-best perturbation, they alleviate the degradation in $N$-best perturbation. This finding is in comparison to fully-tuned models and vanilla LoRA tuning baselines, suggesting that a comprehensive selection is needed when using LoRA-based adaptation for compute-cost savings and robust language modeling.
- [1139] arXiv:2401.10463 (cross-list from cs.CL) [ pdf , other ]
-
Title: Critical Data Size of Language Models from a Grokking PerspectiveSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We explore the critical data size in language models, a threshold that marks a fundamental shift from quick memorization to slow generalization. We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics. We develop a grokking configuration to reproduce grokking on simplistic language models stably by rescaling initialization and weight decay. We show that generalization occurs only when language models reach a critical size. We analyze grokking across sample-wise and model-wise, verifying the proposed data efficiency hypothesis. Our experiments reveal smoother phase transitions occurring at the critical dataset size for language datasets. As the model size increases, this critical point also becomes larger, indicating that larger models require more data. Our results deepen the understanding of language model training, offering a novel perspective on the role of data in the learning mechanism of language models.
- [1140] arXiv:2401.10471 (cross-list from cs.CL) [ pdf , other ]
-
Title: DeepEdit: Knowledge Editing as Decoding with ConstraintsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We propose a new perspective of knowledge editing (KE) for large language models (LLMs) that treats it as a constrained decoding problem. We design decoding constraints to regulate LLMs, ensuring coherence between reasoning steps when incorporating new knowledge. To enforce these constraints, we utilize a depth-first search to adaptively substitute new knowledge for the LLMs' original reasoning steps, greedily seeking the optimal path of multi-hop reasoning with new knowledge. From this vantage, we propose DEEPEDIT: Depth-first Search-based Decoding for Knowledge Editing. DEEPEDIT improves the KE of LLMs by enhancing the conciseness, coherence, pertinence, and receptiveness of reasoning with new knowledge. DEEPEDIT is flexibly applicable to any black-box LLM without requiring access to model parameters or token-wise distributions. In addition to DEEPEDIT, we propose two new KE benchmarks: MQuAKE-2002 and MQuAKE-hard, which are designed to provide more precise and challenging assessments of KE approaches. Qualitatively, DEEPEDIT enables LLMs to produce more succinct reasoning outputs in accordance with new knowledge. Quantitatively, it yields significant improvements on multiple KE benchmarks.
- [1141] arXiv:2401.10474 (cross-list from cs.LG) [ pdf , other ]
-
Title: LDReg: Local Dimensionality Regularized Self-Supervised LearningAuthors: Hanxun Huang , Ricardo J. G. B. Campello , Sarah Monazam Erfani , Xingjun Ma , Michael E. Houle , James BaileyComments: ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: Representations learned via self-supervised learning (SSL) can be susceptible to dimensional collapse, where the learned representation subspace is of extremely low dimensionality and thus fails to represent the full data distribution and modalities. Dimensional collapse also known as the "underfilling" phenomenon is one of the major causes of degraded performance on downstream tasks. Previous work has investigated the dimensional collapse problem of SSL at a global level. In this paper, we demonstrate that representations can span over high dimensional space globally, but collapse locally. To address this, we propose a method called $\textit{local dimensionality regularization (LDReg)}$. Our formulation is based on the derivation of the Fisher-Rao metric to compare and optimize local distance distributions at an asymptotically small radius for each data point. By increasing the local intrinsic dimensionality, we demonstrate through a range of experiments that LDReg improves the representation quality of SSL. The results also show that LDReg can regularize dimensionality at both local and global levels.
- [1142] arXiv:2401.10480 (cross-list from cs.CL) [ pdf , other ]
-
Title: Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningAuthors: Yiwei Li , Peiwen Yuan , Shaoxiong Feng , Boyuan Pan , Xinglin Wang , Bin Sun , Heda Wang , Kan LiComments: ICLR 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Self-consistency (SC) has been a widely used decoding strategy for chain-of-thought reasoning. Despite bringing significant performance improvements across a variety of multi-step reasoning tasks, it is a high-cost method that requires multiple sampling with the preset size. In this paper, we propose a simple and scalable sampling process, \textbf{E}arly-Stopping \textbf{S}elf-\textbf{C}onsistency (ESC), to greatly reduce the cost of SC without sacrificing performance. On this basis, one control scheme for ESC is further derivated to dynamically choose the performance-cost balance for different tasks and models. To demonstrate ESC's effectiveness, we conducted extensive experiments on three popular categories of reasoning tasks: arithmetic, commonsense and symbolic reasoning over language models with varying scales. The empirical results show that ESC reduces the average number of sampling of chain-of-thought reasoning by a significant margin on six benchmarks, including MATH (-33.8%), GSM8K (-80.1%), StrategyQA (-76.8%), CommonsenseQA (-78.5%), Coin Flip (-84.2%) and Last Letters (-67.4%), while attaining comparable performances.
- [1143] arXiv:2401.10484 (cross-list from cs.IR) [ pdf , other ]
-
Title: Enhancing Scalability in Recommender Systems through Lottery Ticket Hypothesis and Knowledge Distillation-based Neural Network PruningComments: Accepted in WITS 2023 as a workshop paperSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: This study introduces an innovative approach aimed at the efficient pruning of neural networks, with a particular focus on their deployment on edge devices. Our method involves the integration of the Lottery Ticket Hypothesis (LTH) with the Knowledge Distillation (KD) framework, resulting in the formulation of three distinct pruning models. These models have been developed to address scalability issue in recommender systems, whereby the complexities of deep learning models have hindered their practical deployment. With judicious application of the pruning techniques, we effectively curtail the power consumption and model dimensions without compromising on accuracy. Empirical evaluation has been performed using two real world datasets from diverse domains against two baselines. Gratifyingly, our approaches yielded a GPU computation-power reduction of up to 66.67%. Notably, our study contributes to the field of recommendation system by pioneering the application of LTH and KD.
- [1144] arXiv:2401.10495 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Causal Layering via Conditional EntropyAuthors: Itai Feigenbaum , Devansh Arpit , Huan Wang , Shelby Heinecke , Juan Carlos Niebles , Weiran Yao , Caiming Xiong , Silvio SavareseSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: Causal discovery aims to recover information about an unobserved causal graph from the observable data it generates. Layerings are orderings of the variables which place causes before effects. In this paper, we provide ways to recover layerings of a graph by accessing the data via a conditional entropy oracle, when distributions are discrete. Our algorithms work by repeatedly removing sources or sinks from the graph. Under appropriate assumptions and conditioning, we can separate the sources or sinks from the remainder of the nodes by comparing their conditional entropy to the unconditional entropy of their noise. Our algorithms are provably correct and run in worst-case quadratic time. The main assumptions are faithfulness and injective noise, and either known noise entropies or weakly monotonically increasing noise entropies along directed paths. In addition, we require one of either a very mild extension of faithfulness, or strictly monotonically increasing noise entropies, or expanding noise injectivity to include an additional single argument in the structural functions.
- [1145] arXiv:2401.10506 (cross-list from cs.CL) [ pdf , other ]
-
Title: FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial AnalysisAuthors: Chao Zhang , Yuren Mao , Yijiang Fan , Yu Mi , Yunjun Gao , Lu Chen , Dongfang Lou , Jinshu LinComments: 13 pages, 13 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Databases (cs.DB)
Abstract: Text-to-SQL, which provides zero-code interface for operating relational databases, has gained much attention in financial analysis; because, financial professionals may not well-skilled in SQL programming. However, until now, there is no practical Text-to-SQL benchmark dataset for financial analysis, and existing Text-to-SQL methods have not considered the unique characteristics of databases in financial applications, such as commonly existing wide tables. To address these issues, we collect a practical Text-to-SQL benchmark dataset and propose a model-agnostic Large Language Model (LLMs)-based Text-to-SQL framework for financial analysis. The benchmark dataset, BULL, is collected from the practical financial analysis business of Hundsun Technologies Inc., including databases for fund, stock, and macro economy. Besides, the proposed LLMs-based Text-to-SQL framework, FinSQL, provides a systematic treatment for financial Text-to-SQL from the perspectives of prompt construction, parameter-efficient fine-tuning and output calibration. Extensive experimental results on BULL demonstrate that FinSQL achieves the state-of-the-art Text-to-SQL performance at a small cost; furthermore, FinSQL can bring up to 36.64% performance improvement in scenarios requiring few-shot cross-database model transfer.
- [1146] arXiv:2401.10510 (cross-list from cs.NE) [ pdf , other ]
-
Title: A match made in consistency heaven: when large language models meet evolutionary algorithmsComments: A perspective article under reviewSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Pre-trained large language models (LLMs) have powerful capabilities for generating creative natural text. Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems. Motivated by the common collective and directionality of text sequence generation and evolution, this paper illustrates the strong consistency of LLMs and EAs, which includes multiple one-to-one key characteristics: token embedding and genotype-phenotype mapping, position encoding and fitness shaping, position embedding and selection, attention and crossover, feed-forward neural network and mutation, model training and parameter update, and multi-task learning and multi-objective optimization. Based on this consistency perspective, existing coupling studies are analyzed, including evolutionary fine-tuning and LLM-enhanced EAs. Leveraging these insights, we outline a fundamental roadmap for future research in coupling LLMs and EAs, while highlighting key challenges along the way. The consistency not only reveals the evolution mechanism behind LLMs but also facilitates the development of evolved artificial agents that approach or surpass biological organisms.
- [1147] arXiv:2401.10516 (cross-list from cs.LG) [ pdf , other ]
-
Title: Episodic Reinforcement Learning with Expanded State-reward SpaceComments: Accepted at AAMAS'24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Empowered by deep neural networks, deep reinforcement learning (DRL) has demonstrated tremendous empirical successes in various domains, including games, health care, and autonomous driving. Despite these advancements, DRL is still identified as data-inefficient as effective policies demand vast numbers of environmental samples. Recently, episodic control (EC)-based model-free DRL methods enable sample efficiency by recalling past experiences from episodic memory. However, existing EC-based methods suffer from the limitation of potential misalignment between the state and reward spaces for neglecting the utilization of (past) retrieval states with extensive information, which probably causes inaccurate value estimation and degraded policy performance. To tackle this issue, we introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used as the input and the expanded rewards used in the training both contain historical and current information. To be specific, we reuse the historical states retrieved by EC as part of the input states and integrate the retrieved MC-returns into the immediate reward in each interactive transition. As a result, our method is able to simultaneously achieve the full utilization of retrieval information and the better evaluation of state values by a Temporal Difference (TD) loss. Empirical results on challenging Box2d and Mujoco tasks demonstrate the superiority of our method over a recent sibling method and common baselines. Further, we also verify our method's effectiveness in alleviating Q-value overestimation by additional experiments of Q-value comparison.
- [1148] arXiv:2401.10521 (cross-list from cs.CL) [ pdf , other ]
-
Title: Cross-lingual Editing in Multilingual Language ModelsComments: Accepted at EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The training of large language models (LLMs) necessitates substantial data and computational resources, and updating outdated LLMs entails significant efforts and resources. While numerous model editing techniques (METs) have emerged to efficiently update model outputs without retraining, their effectiveness in multilingual LLMs, where knowledge is stored in diverse languages, remains an underexplored research area. This research paper introduces the cross-lingual model editing (\textbf{XME}) paradigm, wherein a fact is edited in one language, and the subsequent update propagation is observed across other languages. To investigate the XME paradigm, we conducted experiments using BLOOM, mBERT, and XLM-RoBERTa using the two writing scripts: \textit{Latin} (English, French, and Spanish) and \textit{Indic} (Hindi, Gujarati, and Bengali). The results reveal notable performance limitations of state-of-the-art METs under the XME setting, mainly when the languages involved belong to two distinct script families. These findings highlight the need for further research and development of XME techniques to address these challenges. For more comprehensive information, the dataset used in this research and the associated code are publicly available at the following URL\url{ this https URL }.
- [1149] arXiv:2401.10529 (cross-list from cs.CV) [ pdf , other ]
-
Title: Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image SequencesAuthors: Xiyao Wang , Yuhang Zhou , Xiaoyu Liu , Hongjin Lu , Yuancheng Xu , Feihong He , Jaehong Yoon , Taixi Lu , Gedas Bertasius , Mohit Bansal , Huaxiu Yao , Furong HuangComments: 27 pages, 23 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated proficiency in handling a variety of visual-language tasks. However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less investigated. To address this challenge, this paper introduces Mementos, a new benchmark designed to assess MLLMs' sequential image reasoning abilities. Mementos features 4,761 diverse image sequences with varying lengths. We also employ a GPT-4 assisted method to evaluate MLLM reasoning performance. Through a careful evaluation of nine recent MLLMs on Mementos, including GPT-4V and Gemini, we find that they struggle to accurately describe dynamic information about given image sequences, often leading to hallucinations/misrepresentations of objects and their corresponding behaviors. Our quantitative analysis and case studies identify three key factors impacting MLLMs' sequential image reasoning: the correlation between object and behavioral hallucinations, the influence of cooccurring behaviors, and the compounding impact of behavioral hallucinations. Our dataset is available at this https URL .
- [1150] arXiv:2401.10544 (cross-list from cs.SD) [ pdf , other ]
-
Title: AAT: Adapting Audio Transformer for Various Acoustics Recognition TasksComments: Preprint version for ICASSP 2024, KoreaSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Recently, Transformers have been introduced into the field of acoustics recognition. They are pre-trained on large-scale datasets using methods such as supervised learning and semi-supervised learning, demonstrating robust generality--It fine-tunes easily to downstream tasks and shows more robust performance. However, the predominant fine-tuning method currently used is still full fine-tuning, which involves updating all parameters during training. This not only incurs significant memory usage and time costs but also compromises the model's generality. Other fine-tuning methods either struggle to address this issue or fail to achieve matching performance. Therefore, we conducted a comprehensive analysis of existing fine-tuning methods and proposed an efficient fine-tuning approach based on Adapter tuning, namely AAT. The core idea is to freeze the audio Transformer model and insert extra learnable Adapters, efficiently acquiring downstream task knowledge without compromising the model's original generality. Extensive experiments have shown that our method achieves performance comparable to or even superior to full fine-tuning while optimizing only 7.118% of the parameters. It also demonstrates superiority over other fine-tuning methods.
- [1151] arXiv:2401.10559 (cross-list from cs.LG) [ pdf , other ]
-
Title: OrchMoE: Efficient Multi-Adapter Learning with Task-Skill SynergyComments: 9 pages, 3 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We advance the field of Parameter-Efficient Fine-Tuning (PEFT) with our novel multi-adapter method, OrchMoE, which capitalizes on modular skill architecture for enhanced forward transfer in neural networks. Unlike prior models that depend on explicit task identification inputs, OrchMoE automatically discerns task categories, streamlining the learning process. This is achieved through an integrated mechanism comprising an Automatic Task Classification module and a Task-Skill Allocation module, which collectively deduce task-specific classifications and tailor skill allocation matrices. Our extensive evaluations on the 'Super Natural Instructions' dataset, featuring 1,600 diverse instructional tasks, indicate that OrchMoE substantially outperforms comparable multi-adapter baselines in terms of both performance and sample utilization efficiency, all while operating within the same parameter constraints. These findings suggest that OrchMoE offers a significant leap forward in multi-task learning efficiency.
- [1152] arXiv:2401.10586 (cross-list from cs.CR) [ pdf , other ]
-
Title: PuriDefense: Randomized Local Implicit Adversarial Purification for Defending Black-box Query-based AttacksSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Black-box query-based attacks constitute significant threats to Machine Learning as a Service (MLaaS) systems since they can generate adversarial examples without accessing the target model's architecture and parameters. Traditional defense mechanisms, such as adversarial training, gradient masking, and input transformations, either impose substantial computational costs or compromise the test accuracy of non-adversarial inputs. To address these challenges, we propose an efficient defense mechanism, PuriDefense, that employs random patch-wise purifications with an ensemble of lightweight purification models at a low level of inference cost. These models leverage the local implicit function and rebuild the natural image manifold. Our theoretical analysis suggests that this approach slows down the convergence of query-based attacks by incorporating randomness into purifications. Extensive experiments on CIFAR-10 and ImageNet validate the effectiveness of our proposed purifier-based defense mechanism, demonstrating significant improvements in robustness against query-based attacks.
- [1153] arXiv:2401.10603 (cross-list from cs.SE) [ pdf , other ]
-
Title: ZnTrack -- Data as CodeComments: 22 pages, 10 figures, 2MB PDFSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The past decade has seen tremendous breakthroughs in computation and there is no indication that this will slow any time soon. Machine learning, large-scale computing resources, and increased industry focus have resulted in rising investments in computer-driven solutions for data management, simulations, and model generation. However, with this growth in computation has come an even larger expansion of data and with it, complexity in data storage, sharing, and tracking. In this work, we introduce ZnTrack, a Python-driven data versioning tool. ZnTrack builds upon established version control systems to provide a user-friendly and easy-to-use interface for tracking parameters in experiments, designing workflows, and storing and sharing data. From this ability to reduce large datasets to a simple Python script emerges the concept of Data as Code, a core component of the work presented here and an undoubtedly important concept as the age of computation continues to evolve. ZnTrack offers an open-source, FAIR data compatible Python package to enable users to harness these concepts of the future.
- [1154] arXiv:2401.10640 (cross-list from cs.CV) [ pdf , other ]
-
Title: A comprehensive study on fidelity metrics for XAISubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The use of eXplainable Artificial Intelligence (XAI) systems has introduced a set of challenges that need resolution. Herein, we focus on how to correctly select an XAI method, an open questions within the field. The inherent difficulty of this task is due to the lack of a ground truth. Several authors have proposed metrics to approximate the fidelity of different XAI methods. These metrics lack verification and have concerning disagreements. In this study, we proposed a novel methodology to verify fidelity metrics, using a well-known transparent model, namely a decision tree. This model allowed us to obtain explanations with perfect fidelity. Our proposal constitutes the first objective benchmark for these metrics, facilitating a comparison of existing proposals, and surpassing existing methods. We applied our benchmark to assess the existing fidelity metrics in two different experiments, each using public datasets comprising 52,000 images. The images from these datasets had a size a 128 by 128 pixels and were synthetic data that simplified the training process. All metric values, indicated a lack of fidelity, with the best one showing a 30 \% deviation from the expected values for perfect explanation. Our experimentation led us to conclude that the current fidelity metrics are not reliable enough to be used in real scenarios. From this finding, we deemed it necessary to development new metrics, to avoid the detected problems, and we recommend the usage of our proposal as a benchmark within the scientific community to address these limitations.
- [1155] arXiv:2401.10641 (cross-list from cs.SI) [ pdf , other ]
-
Title: An Effective Index for Truss-based Community Search on Large Directed GraphsComments: 8 pages, 8figuresSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: Community search is a derivative of community detection that enables online and personalized discovery of communities and has found extensive applications in massive real-world networks. Recently, there needs to be more focus on the community search issue within directed graphs, even though substantial research has been carried out on undirected graphs. The recently proposed D-truss model has achieved good results in the quality of retrieved communities. However, existing D-truss-based work cannot perform efficient community searches on large graphs because it consumes too many computing resources to retrieve the maximal D-truss. To overcome this issue, we introduce an innovative merge relation known as D-truss-connected to capture the inherent density and cohesiveness of edges within D-truss. This relation allows us to partition all the edges in the original graph into a series of D-truss-connected classes. Then, we construct a concise and compact index, ConDTruss, based on D-truss-connected. Using ConDTruss, the efficiency of maximum D-truss retrieval will be greatly improved, making it a theoretically optimal approach. Experimental evaluations conducted on large directed graph certificate the effectiveness of our proposed method.
- [1156] arXiv:2401.10642 (cross-list from cs.SI) [ pdf , other ]
-
Title: Fast Butterfly-Core Community Search For Large Labeled GraphsComments: 8 pages, 8 figuresSubjects: Social and Information Networks (cs.SI) ; Artificial Intelligence (cs.AI)
Abstract: Community Search (CS) aims to identify densely interconnected subgraphs corresponding to query vertices within a graph. However, existing heterogeneous graph-based community search methods need help identifying cross-group communities and suffer from efficiency issues, making them unsuitable for large graphs. This paper presents a fast community search model based on the Butterfly-Core Community (BCC) structure for heterogeneous graphs. The Random Walk with Restart (RWR) algorithm and butterfly degree comprehensively evaluate the importance of vertices within communities, allowing leader vertices to be rapidly updated to maintain cross-group cohesion. Moreover, we devised a more efficient method for updating vertex distances, which minimizes vertex visits and enhances operational efficiency. Extensive experiments on several real-world temporal graphs demonstrate the effectiveness and efficiency of this solution.
- [1157] arXiv:2401.10643 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: A Comprehensive Survey on Deep-Learning-based Vehicle Re-Identification: Models, Data Sets and ChallengesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract: Vehicle re-identification (ReID) endeavors to associate vehicle images collected from a distributed network of cameras spanning diverse traffic environments. This task assumes paramount importance within the spectrum of vehicle-centric technologies, playing a pivotal role in deploying Intelligent Transportation Systems (ITS) and advancing smart city initiatives. Rapid advancements in deep learning have significantly propelled the evolution of vehicle ReID technologies in recent years. Consequently, undertaking a comprehensive survey of methodologies centered on deep learning for vehicle re-identification has become imperative and inescapable. This paper extensively explores deep learning techniques applied to vehicle ReID. It outlines the categorization of these methods, encompassing supervised and unsupervised approaches, delves into existing research within these categories, introduces datasets and evaluation criteria, and delineates forthcoming challenges and potential research directions. This comprehensive assessment examines the landscape of deep learning in vehicle ReID and establishes a foundation and starting point for future works. It aims to serve as a complete reference by highlighting challenges and emerging trends, fostering advancements and applications in vehicle ReID utilizing deep learning models.
- [1158] arXiv:2401.10660 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Simple Framework to Accelerate Multilingual Language Model for Monolingual Text GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancements in large language models have facilitated the execution of complex language tasks, not only in English but also in non-English languages. However, the tokenizers of most language models, such as Llama, trained on English-centric corpora, tend to excessively fragment tokens in non-English languages. This issue is especially pronounced in non-roman alphabetic languages, which are often divided at a character or even Unicode level, leading to slower text generation. To address this, our study introduces a novel framework designed to expedite text generation in these languages. This framework predicts larger linguistic units than those of conventional multilingual tokenizers and is specifically tailored to the target language, thereby reducing the number of decoding steps required. Our empirical results demonstrate that the proposed framework increases the generation speed by a factor of 1.9 compared to standard decoding while maintaining the performance of a pre-trained multilingual model on monolingual tasks.
- [1159] arXiv:2401.10685 (cross-list from cs.LG) [ pdf , other ]
-
Title: Towards End-to-End GPS Localization with Neural Pseudorange CorrectionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Pseudorange errors are the root cause of localization inaccuracy in GPS. Previous data-driven methods regress and eliminate pseudorange errors using handcrafted intermediate labels. Unlike them, we propose an end-to-end GPS localization framework, E2E-PrNet, to train a neural network for pseudorange correction (PrNet) directly using the final task loss calculated with the ground truth of GPS receiver states. The gradients of the loss with respect to learnable parameters are backpropagated through a differentiable nonlinear least squares optimizer to PrNet. The feasibility is verified with GPS data collected by Android phones, showing that E2E-PrNet outperforms the state-of-the-art end-to-end GPS localization methods.
- [1160] arXiv:2401.10690 (cross-list from cs.LG) [ pdf , other ]
-
Title: Beyond RMSE and MAE: Introducing EAUC to unmask hidden bias and unfairness in dyadic regression modelsAuthors: Jorge Paz-Ruza , Amparo Alonso-Betanzos , Bertha Guijarro-Berdiñas , Brais Cancela , Carlos Eiras-FrancoSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Dyadic regression models, which predict real-valued outcomes for pairs of entities, are fundamental in many domains (e.g. predicting the rating of a user to a product in Recommender Systems) and promising and under exploration in many others (e.g. approximating the adequate dosage of a drug for a patient in personalized pharmacology). In this work, we demonstrate that non-uniformity in the observed value distributions of individual entities leads to severely biased predictions in state-of-the-art models, skewing predictions towards the average of observed past values for the entity and providing worse-than-random predictive power in eccentric yet equally important cases. We show that the usage of global error metrics like Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) is insufficient to capture this phenomenon, which we name eccentricity bias, and we introduce Eccentricity-Area Under the Curve (EAUC) as a new complementary metric that can quantify it in all studied models and datasets. We also prove the adequateness of EAUC by using naive de-biasing corrections to demonstrate that a lower model bias correlates with a lower EAUC and vice-versa. This work contributes a bias-aware evaluation of dyadic regression models to avoid potential unfairness and risks in critical real-world applications of such systems.
- [1161] arXiv:2401.10700 (cross-list from cs.LG) [ pdf , other ]
-
Title: Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion ModelAuthors: Yinan Zheng , Jianxiong Li , Dongjie Yu , Yujie Yang , Shengbo Eben Li , Xianyuan Zhan , Jingjing LiuComments: ICLR 2024, 30pages, 11 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Safe offline RL is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios. An alternative is to enforce the hard constraint of zero violation. However, this can be challenging in offline setting, as it needs to strike the right balance among three highly intricate and correlated aspects: safety constraint satisfaction, reward maximization, and behavior regularization imposed by offline datasets. Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset. This seamlessly converts the original trilogy problem to a feasibility-dependent objective, i.e., maximizing reward value within the feasible region while minimizing safety risks in the infeasible region. Inspired by these, we propose FISOR (FeasIbility-guided Safe Offline RL), which allows safety constraint adherence, reward maximization, and offline policy learning to be realized via three decoupled processes, while offering strong safety performance and stability. In FISOR, the optimal policy for the translated optimization problem can be derived in a special form of weighted behavior cloning. Thus, we propose a novel energy-guided diffusion model that does not require training a complicated time-dependent classifier to extract the policy, greatly simplifying the training. We compare FISOR against baselines on DSRL benchmark for safe offline RL. Evaluation results show that FISOR is the only method that can guarantee safety satisfaction in all tasks, while achieving top returns in most tasks.
- [1162] arXiv:2401.10711 (cross-list from cs.CV) [ pdf , other ]
-
Title: Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question AnsweringSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Video Question Answering (VideoQA) aims to answer natural language questions based on the information observed in videos. Despite the recent success of Large Multimodal Models (LMMs) in image-language understanding and reasoning, they deal with VideoQA insufficiently by simply taking uniformly sampled frames as visual inputs, which ignores question-relevant visual clues. Moreover, there are no human annotations for question-critical timestamps in existing VideoQA datasets. In light of this, we propose a novel weakly supervised framework to enforce the LMMs to reason out the answers with question-critical moments as visual inputs. Specifically, we fuse the question and answer pairs as event descriptions to find multiple keyframes as target moments, which will be pseudo-labels. With these pseudo-labels as additionally weak supervision, we devise a lightweight Gaussian-based Contrastive Grounding (GCG) module. GCG learns multiple Gaussian functions to characterize the temporal structure of the video, and sample question-critical frames as positive moments to be the visual inputs of LMMs. Extensive experiments on several VideoQA benchmarks verify the effectiveness of our framework, and we achieve substantial improvements compared to previous state-of-the-art methods.
- [1163] arXiv:2401.10712 (cross-list from cs.CV) [ pdf , other ]
-
Title: Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World KnowledgeSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: With the breakthrough of multi-modal large language models, answering complex visual questions that demand advanced reasoning abilities and world knowledge has become a much more important testbed for developing AI models than ever. However, equipping AI models with robust cross-modality reasoning ability remains challenging since the cognition scheme of humans has not been understood systematically. In this paper, we believe that if we can collect visual clues in the given image as much as possible, we will recognize the image more accurately, understand the question better, recall relevant knowledge more easily, and finally reason out the answer. We discover these rich visual clues by mining question-answer pairs in images and sending them into multi-modal large language models as prompts. We call the proposed method Q&A Prompts. Specifically, we first use the image-answer pairs and the corresponding questions in the training set as inputs and outputs to train a visual question generation model. Then, we use an image tagging model to identify various instances and send packaged image-tag pairs into the visual question generation model to generate relevant questions with the extracted image tags as answers. Finally, we encode these generated question-answer pairs as prompts with a visual-aware prompting module and send them into pre-trained multi-modal large language models to reason out the final answers. Experimental results show that, compared with state-of-the-art methods, our Q&A Prompts achieves substantial improvements on the challenging visual question answering datasets requiring reasoning over diverse world knowledge, such as OK-VQA and A-OKVQA.
- [1164] arXiv:2401.10725 (cross-list from cs.LO) [ pdf , ps , other ]
-
Title: Proceedings 14th International Conference on Automated Deduction in GeometryAuthors: Pedro Quaresma (University of Coimbra, Portugal), Zoltán Kovács (The Private University College of Education of the Diocese of Linz, Austria)Journal-ref: EPTCS 398, 2024Subjects: Logic in Computer Science (cs.LO) ; Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Mathematical Software (cs.MS)
Abstract: ADG is a forum to exchange ideas and views, to present research results and progress, and to demonstrate software tools at the intersection between geometry and automated deduction. The conference is held every two years. The previous editions of ADG were held in Hagenberg in 2021 (online, postponed from 2020 due to COVID-19), Nanning in 2018, Strasbourg in 2016, Coimbra in 2014, Edinburgh in 2012, Munich in 2010, Shanghai in 2008, Pontevedra in 2006, Gainesville in 2004, Hagenberg in 2002, Zurich in 2000, Beijing in 1998, and Toulouse in 1996.
The 14th edition, ADG 2023, was held in Belgrade, Serbia, in September 20-22, 2023. This edition of ADG had an additional special focus topic, Deduction in Education.
Invited Speakers: Julien Narboux, University of Strasbourg, France "Formalisation, arithmetization and automatisation of geometry"; Filip Marić, University of Belgrade, Serbia, "Automatization, formalization and visualization of hyperbolic geometry"; Zlatan Magajna, University of Ljubljana, Slovenia, "Workshop OK Geometry" - [1165] arXiv:2401.10733 (cross-list from cs.IR) [ pdf , other ]
-
Title: Dynamic Q&A of Clinical Documents with Large Language ModelsComments: 8 pages, 4 figuresSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Electronic health records (EHRs) house crucial patient data in clinical notes. As these notes grow in volume and complexity, manual extraction becomes challenging. This work introduces a natural language interface using large language models (LLMs) for dynamic question-answering on clinical notes. Our chatbot, powered by Langchain and transformer-based LLMs, allows users to query in natural language, receiving relevant answers from clinical notes. Experiments, utilizing various embedding models and advanced LLMs, show Wizard Vicuna's superior accuracy, albeit with high compute demands. Model optimization, including weight quantization, improves latency by approximately 48 times. Promising results indicate potential, yet challenges such as model hallucinations and limited diverse medical case evaluations remain. Addressing these gaps is crucial for unlocking the value in clinical notes and advancing AI-driven clinical decision-making.
- [1166] arXiv:2401.10745 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Ethical Artificial Intelligence Principles and Guidelines for the Governance and Utilization of Highly Advanced Large Language ModelsComments: 4 pages, accepted to workshop on Responsible Language Models (ReLM) at Association of the Advancement of Artificial Intelligence Conference (AAAI 2024)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract: Given the success of ChatGPT, LaMDA and other large language models (LLMs), there has been an increase in development and usage of LLMs within the technology sector and other sectors. While the level in which LLMs has not reached a level where it has surpassed human intelligence, there will be a time when it will. Such LLMs can be referred to as advanced LLMs. Currently, there are limited usage of ethical artificial intelligence (AI) principles and guidelines addressing advanced LLMs due to the fact that we have not reached that point yet. However, this is a problem as once we do reach that point, we will not be adequately prepared to deal with the aftermath of it in an ethical and optimal way, which will lead to undesired and unexpected consequences. This paper addresses this issue by discussing what ethical AI principles and guidelines can be used to address highly advanced LLMs.
- [1167] arXiv:2401.10747 (cross-list from cs.SD) [ pdf , other ]
-
Title: Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer ApproachComments: 5 pagesSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities. Moreover, we develop a cross-modality attention mechanism to retain the maximal information of the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baselines and achieve comparable results to the previous methods with complete multi-modality supervision.
- [1168] arXiv:2401.10753 (cross-list from cs.AR) [ pdf , other ]
-
Title: BoolGebra: Attributed Graph-learning for Boolean Algebraic ManipulationComments: DATE 2024 extended version. arXiv admin note: text overlap with arXiv:2310.07846Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Boolean algebraic manipulation is at the core of logic synthesis in Electronic Design Automation (EDA) design flow. Existing methods struggle to fully exploit optimization opportunities, and often suffer from an explosive search space and limited scalability efficiency. This work presents BoolGebra, a novel attributed graph-learning approach for Boolean algebraic manipulation that aims to improve fundamental logic synthesis. BoolGebra incorporates Graph Neural Networks (GNNs) and takes initial feature embeddings from both structural and functional information as inputs. A fully connected neural network is employed as the predictor for direct optimization result predictions, significantly reducing the search space and efficiently locating the optimization space. The experiments involve training the BoolGebra model w.r.t design-specific and cross-design inferences using the trained model, where BoolGebra demonstrates generalizability for cross-design inference and its potential to scale from small, simple training datasets to large, complex inference datasets. Finally, BoolGebra is integrated with existing synthesis tool ABC to perform end-to-end logic minimization evaluation w.r.t SOTA baselines.
- [1169] arXiv:2401.10759 (cross-list from cs.HC) [ pdf , other ]
-
Title: Interactions with Prompt Problems: A New Way to Teach Programming with Large Language ModelsAuthors: James Prather , Paul Denny , Juho Leinonen , David H. Smith IV , Brent N. Reeves , Stephen MacNeil , Brett A. Becker , Andrew Luxton-Reilly , Thezyrie Amarouche , Bailey KimmelComments: accepted for CHI 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have upended decades of pedagogy in computing education. Students previously learned to code through \textit{writing} many small problems with less emphasis on code reading and comprehension. Recent research has shown that free code generation tools powered by LLMs can solve introductory programming problems presented in natural language with ease. In this paper, we propose a new way to teach programming with Prompt Problems. Students receive a problem visually, indicating how input should be transformed to output, and must translate that to a prompt for an LLM to decipher. The problem is considered correct when the code that is generated by the student prompt can pass all test cases. In this paper we present the design of this tool, discuss student interactions with it as they learn, and provide insights into this new class of programming problems as well as the design tools that integrate LLMs.
- [1170] arXiv:2401.10805 (cross-list from cs.CV) [ pdf , other ]
-
Title: Learning to Visually Connect Actions and their EffectsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: In this work, we introduce the novel concept of visually Connecting Actions and Their Effects (CATE) in video understanding. CATE can have applications in areas like task planning and learning from demonstration. We propose different CATE-based task formulations, such as action selection and action specification, where video understanding models connect actions and effects at semantic and fine-grained levels. We observe that different formulations produce representations capturing intuitive action properties. We also design various baseline models for action selection and action specification. Despite the intuitive nature of the task, we observe that models struggle, and humans outperform them by a large margin. The study aims to establish a foundation for future efforts, showcasing the flexibility and versatility of connecting actions and effects in video understanding, with the hope of inspiring advanced formulations and models.
- [1171] arXiv:2401.10816 (cross-list from cs.HC) [ pdf , other ]
-
Title: Co-Pilot for Health: Personalized Algorithmic AI Nudging to Improve Health OutcomesComments: 20 pages, 2 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The ability to shape health behaviors of large populations automatically, across wearable types and disease conditions at scale has tremendous potential to improve global health outcomes. We designed and implemented an AI driven platform for digital algorithmic nudging, enabled by a Graph-Neural Network (GNN) based Recommendation System, and granular health behavior data from wearable fitness devices. Here we describe the efficacy results of this platform with its capabilities of personalized and contextual nudging to $n=84,764$ individuals over a 12-week period in Singapore. We statistically validated that participants in the target group who received such AI optimized daily nudges increased daily physical activity like step count by 6.17% ($p = 3.09\times10^{-4}$) and weekly minutes of Moderate to Vigorous Physical Activity (MVPA) by 7.61% ($p = 1.16\times10^{-2}$), compared to matched participants in control group who did not receive any nudges. Further, such nudges were very well received, with a 13.1% of nudges sent being opened (open rate), and 11.7% of the opened nudges rated useful compared to 1.9% rated as not useful thereby demonstrating significant improvement in population level engagement metrics.
- [1172] arXiv:2401.10831 (cross-list from cs.CV) [ pdf , other ]
-
Title: Understanding Video Transformers via Universal Concept DiscoveryAuthors: Matthew Kowal , Achal Dave , Rares Ambrus , Adrien Gaidon , Konstantinos G. Derpanis , Pavel TokmakovComments: CVPR 2024 (Highlight)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: This paper studies the problem of concept-based interpretability of transformer representations for videos. Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automatically discovered. Prior research on concept-based interpretability has concentrated solely on image-level tasks. Comparatively, video models deal with the added temporal dimension, increasing complexity and posing challenges in identifying dynamic concepts over time. In this work, we systematically address these challenges by introducing the first Video Transformer Concept Discovery (VTCD) algorithm. To this end, we propose an efficient approach for unsupervised identification of units of video transformer representations - concepts, and ranking their importance to the output of a model. The resulting concepts are highly interpretable, revealing spatio-temporal reasoning mechanisms and object-centric representations in unstructured video models. Performing this analysis jointly over a diverse set of supervised and self-supervised representations, we discover that some of these mechanism are universal in video transformers. Finally, we show that VTCD can be used for fine-grained action recognition and video object segmentation.
- [1173] arXiv:2401.10839 (cross-list from cs.DC) [ pdf , other ]
-
Title: Holonic Learning: A Flexible Agent-based Distributed Machine Learning FrameworkSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: Ever-increasing ubiquity of data and computational resources in the last decade have propelled a notable transition in the machine learning paradigm towards more distributed approaches. Such a transition seeks to not only tackle the scalability and resource distribution challenges but also to address pressing privacy and security concerns. To contribute to the ongoing discourse, this paper introduces Holonic Learning (HoL), a collaborative and privacy-focused learning framework designed for training deep learning models. By leveraging holonic concepts, the HoL framework establishes a structured self-similar hierarchy in the learning process, enabling more nuanced control over collaborations through the individual model aggregation approach of each holon, along with their intra-holon commitment and communication patterns. HoL, in its general form, provides extensive design and flexibility potentials. For empirical analysis and to demonstrate its effectiveness, this paper implements HoloAvg, a special variant of HoL that employs weighted averaging for model aggregation across all holons. The convergence of the proposed method is validated through experiments on both IID and Non-IID settings of the standard MNISt dataset. Furthermore, the performance behaviors of HoL are investigated under various holarchical designs and data distribution scenarios. The presented results affirm HoL's prowess in delivering competitive performance particularly, in the context of the Non-IID data distribution.
- [1174] arXiv:2401.10840 (cross-list from cs.CY) [ pdf , other ]
-
Title: Symbolic Cognitive Diagnosis via Hybrid Optimization for Intelligent Education SystemsJournal-ref: Published in AAAI 2024Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Cognitive diagnosis assessment is a fundamental and crucial task for student learning. It models the student-exercise interaction, and discovers the students' proficiency levels on each knowledge attribute. In real-world intelligent education systems, generalization and interpretability of cognitive diagnosis methods are of equal importance. However, most existing methods can hardly make the best of both worlds due to the complicated student-exercise interaction. To this end, this paper proposes a symbolic cognitive diagnosis~(SCD) framework to simultaneously enhance generalization and interpretability. The SCD framework incorporates the symbolic tree to explicably represent the complicated student-exercise interaction function, and utilizes gradient-based optimization methods to effectively learn the student and exercise parameters. Meanwhile, the accompanying challenge is that we need to tunnel the discrete symbolic representation and continuous parameter optimization. To address this challenge, we propose to hybridly optimize the representation and parameters in an alternating manner. To fulfill SCD, it alternately learns the symbolic tree by derivative-free genetic programming and learns the student and exercise parameters via gradient-based Adam. The extensive experimental results on various real-world datasets show the superiority of SCD on both generalization and interpretability. The ablation study verifies the efficacy of each ingredient in SCD, and the case study explicitly showcases how the interpretable ability of SCD works.
- [1175] arXiv:2401.10841 (cross-list from cs.CL) [ pdf , other ]
-
Title: Using LLMs to discover emerging coded antisemitic hate-speech in extremist social mediaAuthors: Dhanush Kikkisetti , Raza Ul Mustafa , Wendy Melillo , Roberto Corizzo , Zois Boukouvalas , Jeff Gill , Nathalie JapkowiczComments: 9 pages, 4 figures, 2 algorithms, 3 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: Online hate speech proliferation has created a difficult problem for social media platforms. A particular challenge relates to the use of coded language by groups interested in both creating a sense of belonging for its users and evading detection. Coded language evolves quickly and its use varies over time. This paper proposes a methodology for detecting emerging coded hate-laden terminology. The methodology is tested in the context of online antisemitic discourse. The approach considers posts scraped from social media platforms, often used by extremist users. The posts are scraped using seed expressions related to previously known discourse of hatred towards Jews. The method begins by identifying the expressions most representative of each post and calculating their frequency in the whole corpus. It filters out grammatically incoherent expressions as well as previously encountered ones so as to focus on emergent well-formed terminology. This is followed by an assessment of semantic similarity to known antisemitic terminology using a fine-tuned large language model, and subsequent filtering out of the expressions that are too distant from known expressions of hatred. Emergent antisemitic expressions containing terms clearly relating to Jewish topics are then removed to return only coded expressions of hatred.
- [1176] arXiv:2401.10848 (cross-list from cs.CV) [ pdf , other ]
-
Title: Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose EstimationComments: 36 pages, 9 figures, 50 tables; ICLR 2024 (Poster)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We consider the problem of source-free unsupervised category-level pose estimation from only RGB images to a target domain without any access to source domain data or 3D annotations during adaptation. Collecting and annotating real-world 3D data and corresponding images is laborious, expensive, yet unavoidable process, since even 3D pose domain adaptation methods require 3D data in the target domain. We introduce 3DUDA, a method capable of adapting to a nuisance-ridden target domain without 3D or depth data. Our key insight stems from the observation that specific object subparts remain stable across out-of-domain (OOD) scenarios, enabling strategic utilization of these invariant subcomponents for effective model updates. We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations modeled at each mesh vertex learnt using differential rendering. We focus on individual locally robust mesh vertex features and iteratively update them based on their proximity to corresponding features in the target domain even when the global pose is not correct. Our model is then trained in an EM fashion, alternating between updating the vertex features and the feature extractor. We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions, which converges to the target domain asymptotically. Through extensive empirical validation, including a complex extreme UDA setup which combines real nuisances, synthetic noise, and occlusion, we demonstrate the potency of our simple approach in addressing the domain shift challenge and significantly improving pose estimation accuracy.
- [1177] arXiv:2401.10850 (cross-list from cs.CL) [ pdf , other ]
-
Title: Advancements in eHealth Data Analytics through Natural Language Processing and Deep LearningSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The healthcare environment is commonly referred to as "information-rich" but also "knowledge poor". Healthcare systems collect huge amounts of data from various sources: lab reports, medical letters, logs of medical tools or programs, medical prescriptions, etc. These massive sets of data can provide great knowledge and information that can improve the medical services, and overall the healthcare domain, such as disease prediction by analyzing the patient's symptoms or disease prevention, by facilitating the discovery of behavioral factors for diseases. Unfortunately, only a relatively small volume of the textual eHealth data is processed and interpreted, an important factor being the difficulty in efficiently performing Big Data operations. In the medical field, detecting domain-specific multi-word terms is a crucial task as they can define an entire concept with a few words. A term can be defined as a linguistic structure or a concept, and it is composed of one or more words with a specific meaning to a domain. All the terms of a domain create its terminology. This chapter offers a critical study of the current, most performant solutions for analyzing unstructured (image and textual) eHealth data. This study also provides a comparison of the current Natural Language Processing and Deep Learning techniques in the eHealth context. Finally, we examine and discuss some of the current issues, and we define a set of research directions in this area.
- [1178] arXiv:2401.10862 (cross-list from cs.LG) [ pdf , other ]
-
Title: Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-TuningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Cryptography and Security (cs.CR)
Abstract: Large Language Models (LLMs) are vulnerable to `Jailbreaking' prompts, a type of attack that can coax these models into generating harmful and illegal content. In this paper, we show that pruning up to 20% of LLM parameters markedly increases their resistance to such attacks without additional training and without sacrificing their performance in standard benchmarks. Intriguingly, we discovered that the enhanced safety observed post-pruning correlates to the initial safety training level of the model, hinting that the effect of pruning could be more general and may hold for other LLM behaviors beyond safety. Additionally, we introduce a curated dataset of 225 harmful tasks across five categories, inserted into ten different Jailbreaking prompts, showing that pruning aids LLMs in concentrating attention on task-relevant tokens in jailbreaking prompts. Lastly, our experiments reveal that the prominent chat models, such as LLaMA-2 Chat, Vicuna, and Mistral Instruct exhibit high susceptibility to jailbreaking attacks, with some categories achieving nearly 70-100% success rate. These insights underline the potential of pruning as a generalizable approach for improving LLM safety, reliability, and potentially other desired behaviors.
- [1179] arXiv:2401.10882 (cross-list from cs.CL) [ pdf , other ]
-
Title: Reinforcement learning for question answering in programming domain using public community scoring as a human feedbackSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: In this study, we investigate the enhancement of the GPT Neo 125M performance in Community Question Answering (CQA) with a focus on programming, through the integration of Reinforcement Learning from Human Feedback (RLHF) and the utilization of scores from Stack Overflow. Two distinct reward model training strategies are employed for fine-tuning with Proximal Policy Optimization (PPO). Notably, the improvements in performance achieved through this method are comparable to those of GPT Neo 2.7B parameter variant. Additionally, an auxiliary scoring mechanism is introduced, which demonstrates the limitations of conventional linguistic metrics in evaluating responses in the programming domain. Through accurate analysis, this paper looks at the divergence between traditional linguistic metrics and our human-preferences-based reward model, underscoring the imperative for domain-specific evaluation methods. By elucidating the complexities involved in applying RLHF to programming CQA and accentuating the significance of context-aware evaluation, this study contributes to the ongoing efforts in refining Large Language Models through focused human feedback.
- [1180] arXiv:2401.10886 (cross-list from cs.CV) [ pdf , other ]
-
Title: SCENES: Subpixel Correspondence Estimation With Epipolar SupervisionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Abstract: Extracting point correspondences from two or more views of a scene is a fundamental computer vision problem with particular importance for relative camera pose estimation and structure-from-motion. Existing local feature matching approaches, trained with correspondence supervision on large-scale datasets, obtain highly-accurate matches on the test sets. However, they do not generalise well to new datasets with different characteristics to those they were trained on, unlike classic feature extractors. Instead, they require finetuning, which assumes that ground-truth correspondences or ground-truth camera poses and 3D structure are available. We relax this assumption by removing the requirement of 3D structure, e.g., depth maps or point clouds, and only require camera pose information, which can be obtained from odometry. We do so by replacing correspondence losses with epipolar losses, which encourage putative matches to lie on the associated epipolar line. While weaker than correspondence supervision, we observe that this cue is sufficient for finetuning existing models on new data. We then further relax the assumption of known camera poses by using pose estimates in a novel bootstrapping approach. We evaluate on highly challenging datasets, including an indoor drone dataset and an outdoor smartphone camera dataset, and obtain state-of-the-art results without strong supervision.
- [1181] arXiv:2401.10889 (cross-list from cs.CV) [ pdf , other ]
-
Title: Synthesizing Moving People with 3D ControlSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we present a diffusion model-based framework for animating people from a single image for a given target 3D motion sequence. Our approach has two core components: a) learning priors about invisible parts of the human body and clothing, and b) rendering novel body poses with proper clothing and texture. For the first part, we learn an in-filling diffusion model to hallucinate unseen parts of a person given a single image. We train this model on texture map space, which makes it more sample-efficient since it is invariant to pose and viewpoint. Second, we develop a diffusion-based rendering pipeline, which is controlled by 3D human poses. This produces realistic renderings of novel poses of the person, including clothing, hair, and plausible in-filling of unseen regions. This disentangled approach allows our method to generate a sequence of images that are faithful to the target motion in the 3D pose and, to the input image in terms of visual similarity. In addition to that, the 3D control allows various synthetic camera trajectories to render a person. Our experiments show that our method is resilient in generating prolonged motions and varied challenging and complex poses compared to prior methods. Please check our website for more details: this https URL .
- [1182] arXiv:2401.10899 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Concrete Problems in AI Safety, RevisitedComments: Published at ICLR workshop on ML in the Real World, 2020Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: As AI systems proliferate in society, the AI community is increasingly preoccupied with the concept of AI Safety, namely the prevention of failures due to accidents that arise from an unanticipated departure of a system's behavior from designer intent in AI deployment. We demonstrate through an analysis of real world cases of such incidents that although current vocabulary captures a range of the encountered issues of AI deployment, an expanded socio-technical framing will be required for a more complete understanding of how AI systems and implemented safety mechanisms fail and succeed in real life.
- [1183] arXiv:2401.10917 (cross-list from cs.IR) [ pdf , other ]
-
Title: Artificial intelligence to automate the systematic review of scientific literatureComments: 25 pages, 3 figures, 1 table, journal paperJournal-ref: Computing, Volume 105, pages 2171-2194, 2023Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Artificial intelligence (AI) has acquired notorious relevance in modern computing as it effectively solves complex tasks traditionally done by humans. AI provides methods to represent and infer knowledge, efficiently manipulate texts and learn from vast amount of data. These characteristics are applicable in many activities that human find laborious or repetitive, as is the case of the analysis of scientific literature. Manually preparing and writing a systematic literature review (SLR) takes considerable time and effort, since it requires planning a strategy, conducting the literature search and analysis, and reporting the findings. Depending on the area under study, the number of papers retrieved can be of hundreds or thousands, meaning that filtering those relevant ones and extracting the key information becomes a costly and error-prone process. However, some of the involved tasks are repetitive and, therefore, subject to automation by means of AI. In this paper, we present a survey of AI techniques proposed in the last 15 years to help researchers conduct systematic analyses of scientific literature. We describe the tasks currently supported, the types of algorithms applied, and available tools proposed in 34 primary studies. This survey also provides a historical perspective of the evolution of the field and the role that humans can play in an increasingly automated SLR process.
- [1184] arXiv:2401.10934 (cross-list from cs.IR) [ pdf , other ]
-
Title: A New Creative Generation Pipeline for Click-Through Rate with Stable Diffusion ModelSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: In online advertising scenario, sellers often create multiple creatives to provide comprehensive demonstrations, making it essential to present the most appealing design to maximize the Click-Through Rate (CTR). However, sellers generally struggle to consider users preferences for creative design, leading to the relatively lower aesthetics and quantities compared to Artificial Intelligence (AI)-based approaches. Traditional AI-based approaches still face the same problem of not considering user information while having limited aesthetic knowledge from designers. In fact that fusing the user information, the generated creatives can be more attractive because different users may have different preferences. To optimize the results, the generated creatives in traditional methods are then ranked by another module named creative ranking model. The ranking model can predict the CTR score for each creative considering user features. However, the two above stages are regarded as two different tasks and are optimized separately. In this paper, we proposed a new automated Creative Generation pipeline for Click-Through Rate (CG4CTR) with the goal of improving CTR during the creative generation stage. Our contributions have 4 parts: 1) The inpainting mode in stable diffusion is firstly applied to creative generation task in online advertising scene. A self-cyclic generation pipeline is proposed to ensure the convergence of training. 2) Prompt model is designed to generate individualized creatives for different user groups, which can further improve the diversity and quality. 3) Reward model comprehensively considers the multimodal features of image and text to improve the effectiveness of creative ranking task, and it is also critical in self-cyclic pipeline. 4) The significant benefits obtained in online and offline experiments verify the significance of our proposed method.
- [1185] arXiv:2401.10935 (cross-list from cs.HC) [ pdf , other ]
-
Title: SeeClick: Harnessing GUI Grounding for Advanced Visual GUI AgentsAuthors: Kanzhi Cheng , Qiushi Sun , Yougang Chu , Fangzhi Xu , Yantao Li , Jianbing Zhang , Zhiyong WuSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Graphical User Interface (GUI) agents are designed to automate complex tasks on digital devices, such as smartphones and desktops. Most existing GUI agents interact with the environment through extracted structured data, which can be notably lengthy (e.g., HTML) and occasionally inaccessible (e.g., on desktops). To alleviate this issue, we propose a novel visual GUI agent -- SeeClick, which only relies on screenshots for task automation. In our preliminary study, we have discovered a key challenge in developing visual GUI agents: GUI grounding -- the capacity to accurately locate screen elements based on instructions. To tackle this challenge, we propose to enhance SeeClick with GUI grounding pre-training and devise a method to automate the curation of GUI grounding data. Along with the efforts above, we have also created ScreenSpot, the first realistic GUI grounding benchmark that encompasses mobile, desktop, and web environments. After pre-training, SeeClick demonstrates significant improvement in ScreenSpot over various baselines. Moreover, comprehensive evaluations on three widely used benchmarks consistently support our finding that advancements in GUI grounding directly correlate with enhanced performance in downstream GUI agent tasks. The model, data and code are available at this https URL .
- [1186] arXiv:2401.10942 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: Machine Unlearning for Recommendation Systems: An InsightComments: In Proceedings of 7th INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATION 2024 ( this https URL )Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This review explores machine unlearning (MUL) in recommendation systems, addressing adaptability, personalization, privacy, and bias challenges. Unlike traditional models, MUL dynamically adjusts system knowledge based on shifts in user preferences and ethical considerations. The paper critically examines MUL's basics, real-world applications, and challenges like algorithmic transparency. It sifts through literature, offering insights into how MUL could transform recommendations, discussing user trust, and suggesting paths for future research in responsible and user-focused artificial intelligence (AI). The document guides researchers through challenges involving the trade-off between personalization and privacy, encouraging contributions to meet practical demands for targeted data removal. Emphasizing MUL's role in secure and adaptive machine learning, the paper proposes ways to push its boundaries. The novelty of this paper lies in its exploration of the limitations of the methods, which highlights exciting prospects for advancing the field.
- [1187] arXiv:2401.10946 (cross-list from cs.HC) [ pdf , other ]
-
Title: Self context-aware emotion perception on human-robot interactionComments: Australasian Conference on Robotics and Automation (ACRA). 2023Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Emotion recognition plays a crucial role in various domains of human-robot interaction. In long-term interactions with humans, robots need to respond continuously and accurately, however, the mainstream emotion recognition methods mostly focus on short-term emotion recognition, disregarding the context in which emotions are perceived. Humans consider that contextual information and different contexts can lead to completely different emotional expressions. In this paper, we introduce self context-aware model (SCAM) that employs a two-dimensional emotion coordinate system for anchoring and re-labeling distinct emotions. Simultaneously, it incorporates its distinctive information retention structure and contextual loss. This approach has yielded significant improvements across audio, video, and multimodal. In the auditory modality, there has been a notable enhancement in accuracy, rising from 63.10% to 72.46%. Similarly, the visual modality has demonstrated improved accuracy, increasing from 77.03% to 80.82%. In the multimodal, accuracy has experienced an elevation from 77.48% to 78.93%. In the future, we will validate the reliability and usability of SCAM on robots through psychology experiments.
- [1188] arXiv:2401.10956 (cross-list from cs.HC) [ pdf , other ]
-
Title: AI Revolution on Chat Bot: Evidence from a Randomized Controlled ExperimentSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: In recent years, generative AI has undergone major advancements, demonstrating significant promise in augmenting human productivity. Notably, large language models (LLM), with ChatGPT-4 as an example, have drawn considerable attention. Numerous articles have examined the impact of LLM-based tools on human productivity in lab settings and designed tasks or in observational studies. Despite recent advances, field experiments applying LLM-based tools in realistic settings are limited. This paper presents the findings of a field randomized controlled trial assessing the effectiveness of LLM-based tools in providing unmonitored support services for information retrieval.
- [1189] arXiv:2401.10965 (cross-list from cs.MA) [ pdf , other ]
-
Title: Decentralizing Coordination in Open Vehicle Fleets for Scalable and Dynamic Task AllocationJournal-ref: Complexity, Volume 2020, Article ID 1047369Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: One of the major challenges in the coordination of large, open, collaborative, and commercial vehicle fleets is dynamic task allocation. Self-concerned individually rational vehicle drivers have both local and global objectives, which require coordination using some fair and efficient task allocation method. In this paper, we review the literature on scalable and dynamic task allocation focusing on deterministic and dynamic two-dimensional linear assignment problems. We focus on multiagent system representation of open vehicle fleets where dynamically appearing vehicles are represented by software agents that should be allocated to a set of dynamically appearing tasks. We give a comparison and critical analysis of recent research results focusing on centralized, distributed, and decentralized solution approaches. Moreover, we propose mathematical models for dynamic versions of the following assignment problems well known in combinatorial optimization: the assignment problem, bottleneck assignment problem, fair matching problem, dynamic minimum deviation assignment problem, $\sum_{k}$-assignment problem, the semiassignment problem, the assignment problem with side constraints, and the assignment problem while recognizing agent qualification; all while considering the main aspect of open vehicle fleets: random arrival of tasks and vehicles (agents) that may become available after assisting previous tasks or by participating in the fleet at times based on individual interest.
- [1190] arXiv:2401.11002 (cross-list from cs.CV) [ pdf , other ]
-
Title: Fast Registration of Photorealistic Avatars for VR Facial AnimationComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a photorealistic avatar of one's likeness while wearing a VR headset. Although high quality registration of person-specific avatars to headset-mounted camera (HMC) images is possible in an offline setting, the performance of generic realtime models are significantly degraded. Online registration is also challenging due to oblique camera views and differences in modality. In this work, we first show that the domain gap between the avatar and headset-camera images is one of the primary sources of difficulty, where a transformer-based architecture achieves high accuracy on domain-consistent data, but degrades when the domain-gap is re-introduced. Building on this finding, we develop a system design that decouples the problem into two parts: 1) an iterative refinement module that takes in-domain inputs, and 2) a generic avatar-guided image-to-image style transfer module that is conditioned on current estimation of expression and head pose. These two modules reinforce each other, as image style transfer becomes easier when close-to-ground-truth examples are shown, and better domain-gap removal helps registration. Our system produces high-quality results efficiently, obviating the need for costly offline registration to generate personalized labels. We validate the accuracy and efficiency of our approach through extensive experiments on a commodity headset, demonstrating significant improvements over direct regression methods as well as offline registration.
- [1191] arXiv:2401.11021 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Analysis and Detection of Multilingual Hate Speech Using Transformer Based Deep LearningComments: 20 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Hate speech is harmful content that directly attacks or promotes hatred against members of groups or individuals based on actual or perceived aspects of identity, such as racism, religion, or sexual orientation. This can affect social life on social media platforms as hateful content shared through social media can harm both individuals and communities. As the prevalence of hate speech increases online, the demand for automated detection as an NLP task is increasing. In this work, the proposed method is using transformer-based model to detect hate speech in social media, like twitter, Facebook, WhatsApp, Instagram, etc. The proposed model is independent of languages and has been tested on Italian, English, German, Bengali. The Gold standard datasets were collected from renowned researcher Zeerak Talat, Sara Tonelli, Melanie Siegel, and Rezaul Karim. The success rate of the proposed model for hate speech detection is higher than the existing baseline and state-of-the-art models with accuracy in Bengali dataset is 89%, in English: 91%, in German dataset 91% and in Italian dataset it is 77%. The proposed algorithm shows substantial improvement to the benchmark method.
- [1192] arXiv:2401.11044 (cross-list from cs.LG) [ pdf , other ]
-
Title: The Significance of Data Abstraction Methods in Machine Learning Classification Processes for Critical Decision-MakingComments: 24 pages, 4 figures, 15 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The applicability of widely adopted machine learning (ML) methods to classification is circumscribed by the imperatives of explicability and uncertainty, particularly evident in domains such as healthcare, behavioural sciences, and finances, wherein accountability assumes priority. Recently, Small and Incomplete Dataset Analyser (SaNDA) has been proposed to enhance the ability to perform classification in such domains, by developing a data abstraction protocol using a ROC curve-based method. This paper focuses on column-wise data transformations called abstractions, which are crucial for SaNDA's classification process and explores alternative abstractions protocols, such as constant binning and quantiles. The best-performing methods have been compared against Random Forest as a baseline for explainable methods. The results suggests that SaNDA can be a viable substitute for Random Forest when data is incomplete, even with minimal missing values. It consistently maintains high accuracy even when half of the dataset is missing, unlike Random Forest which experiences a significant decline in accuracy under similar conditions.
- [1193] arXiv:2401.11061 (cross-list from cs.CV) [ pdf , other ]
-
Title: PhotoBot: Reference-Guided Interactive Photography via Natural LanguageComments: Submitted to the IEEE/RSJ International Conference on Intelligent Robotics and Systems (IROS'24), Abu Dhabi, UAE, Oct 14-18, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.
- [1194] arXiv:2401.11081 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learning from Aggregate responses: Instance Level versus Bag Level Loss FunctionsComments: To appear in the Twelfth International Conference on Learning Representations (ICLR 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Statistics Theory (math.ST); Machine Learning (stat.ML)
Abstract: Due to the rise of privacy concerns, in many practical applications the training data is aggregated before being shared with the learner, in order to protect privacy of users' sensitive responses. In an aggregate learning framework, the dataset is grouped into bags of samples, where each bag is available only with an aggregate response, providing a summary of individuals' responses in that bag. In this paper, we study two natural loss functions for learning from aggregate responses: bag-level loss and the instance-level loss. In the former, the model is learnt by minimizing a loss between aggregate responses and aggregate model predictions, while in the latter the model aims to fit individual predictions to the aggregate responses. In this work, we show that the instance-level loss can be perceived as a regularized form of the bag-level loss. This observation lets us compare the two approaches with respect to bias and variance of the resulting estimators, and introduce a novel interpolating estimator which combines the two approaches. For linear regression tasks, we provide a precise characterization of the risk of the interpolating estimator in an asymptotic regime where the size of the training set grows in proportion to the features dimension. Our analysis allows us to theoretically understand the effect of different factors, such as bag size on the model prediction risk. In addition, we propose a mechanism for differentially private learning from aggregate responses and derive the optimal bag size in terms of prediction risk-privacy trade-off. We also carry out thorough experiments to corroborate our theory and show the efficacy of the interpolating estimator.
- [1195] arXiv:2401.11085 (cross-list from cs.CV) [ pdf , other ]
-
Title: Adaptive Global-Local Representation Learning and Selection for Cross-Domain Facial Expression RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Domain shift poses a significant challenge in Cross-Domain Facial Expression Recognition (CD-FER) due to the distribution variation across different domains. Current works mainly focus on learning domain-invariant features through global feature adaptation, while neglecting the transferability of local features. Additionally, these methods lack discriminative supervision during training on target datasets, resulting in deteriorated feature representation in target domain. To address these limitations, we propose an Adaptive Global-Local Representation Learning and Selection (AGLRLS) framework. The framework incorporates global-local adversarial adaptation and semantic-aware pseudo label generation to enhance the learning of domain-invariant and discriminative feature during training. Meanwhile, a global-local prediction consistency learning is introduced to improve classification results during inference. Specifically, the framework consists of separate global-local adversarial learning modules that learn domain-invariant global and local features independently. We also design a semantic-aware pseudo label generation module, which computes semantic labels based on global and local features. Moreover, a novel dynamic threshold strategy is employed to learn the optimal thresholds by leveraging independent prediction of global and local features, ensuring filtering out the unreliable pseudo labels while retaining reliable ones. These labels are utilized for model optimization through the adversarial learning process in an end-to-end manner. During inference, a global-local prediction consistency module is developed to automatically learn an optimal result from multiple predictions. We conduct comprehensive experiments and analysis based on a fair evaluation benchmark. The results demonstrate that the proposed framework outperforms the current competing methods by a substantial margin.
- [1196] arXiv:2401.11089 (cross-list from cs.CR) [ pdf , other ]
-
Title: FedRKG: A Privacy-preserving Federated Recommendation Framework via Knowledge Graph EnhancementSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
Abstract: Federated Learning (FL) has emerged as a promising approach for preserving data privacy in recommendation systems by training models locally. Recently, Graph Neural Networks (GNN) have gained popularity in recommendation tasks due to their ability to capture high-order interactions between users and items. However, privacy concerns prevent the global sharing of the entire user-item graph. To address this limitation, some methods create pseudo-interacted items or users in the graph to compensate for missing information for each client. Unfortunately, these methods introduce random noise and raise privacy concerns. In this paper, we propose FedRKG, a novel federated recommendation system, where a global knowledge graph (KG) is constructed and maintained on the server using publicly available item information, enabling higher-order user-item interactions. On the client side, a relation-aware GNN model leverages diverse KG relationships. To protect local interaction items and obscure gradients, we employ pseudo-labeling and Local Differential Privacy (LDP). Extensive experiments conducted on three real-world datasets demonstrate the competitive performance of our approach compared to centralized algorithms while ensuring privacy preservation. Moreover, FedRKG achieves an average accuracy improvement of 4% compared to existing federated learning baselines.
- [1197] arXiv:2401.11113 (cross-list from cs.LG) [ pdf , other ]
-
Title: SleepNet: Attention-Enhanced Robust Sleep Prediction using Dynamic Social NetworksAuthors: Maryam Khalid , Elizabeth B. Klerman , Andrew W. Mchill , Andrew J. K. Phillips , Akane SanoComments: Accepted for publication in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 8 (March 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Signal Processing (eess.SP)
Abstract: Sleep behavior significantly impacts health and acts as an indicator of physical and mental well-being. Monitoring and predicting sleep behavior with ubiquitous sensors may therefore assist in both sleep management and tracking of related health conditions. While sleep behavior depends on, and is reflected in the physiology of a person, it is also impacted by external factors such as digital media usage, social network contagion, and the surrounding weather. In this work, we propose SleepNet, a system that exploits social contagion in sleep behavior through graph networks and integrates it with physiological and phone data extracted from ubiquitous mobile and wearable devices for predicting next-day sleep labels about sleep duration. Our architecture overcomes the limitations of large-scale graphs containing connections irrelevant to sleep behavior by devising an attention mechanism. The extensive experimental evaluation highlights the improvement provided by incorporating social networks in the model. Additionally, we conduct robustness analysis to demonstrate the system's performance in real-life conditions. The outcomes affirm the stability of SleepNet against perturbations in input data. Further analyses emphasize the significance of network topology in prediction performance revealing that users with higher eigenvalue centrality are more vulnerable to data perturbations.
- [1198] arXiv:2401.11120 (cross-list from cs.CL) [ pdf , other ]
-
Title: Enhancing Large Language Models for Clinical Decision Support by Incorporating Clinical Practice GuidelinesAuthors: David Oniani , Xizhi Wu , Shyam Visweswaran , Sumit Kapoor , Shravan Kooragayalu , Katelyn Polanska , Yanshan WangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Background Large Language Models (LLMs), enhanced with Clinical Practice Guidelines (CPGs), can significantly improve Clinical Decision Support (CDS). However, methods for incorporating CPGs into LLMs are not well studied. Methods We develop three distinct methods for incorporating CPGs into LLMs: Binary Decision Tree (BDT), Program-Aided Graph Construction (PAGC), and Chain-of-Thought-Few-Shot Prompting (CoT-FSP). To evaluate the effectiveness of the proposed methods, we create a set of synthetic patient descriptions and conduct both automatic and human evaluation of the responses generated by four LLMs: GPT-4, GPT-3.5 Turbo, LLaMA, and PaLM 2. Zero-Shot Prompting (ZSP) was used as the baseline method. We focus on CDS for COVID-19 outpatient treatment as the case study. Results All four LLMs exhibit improved performance when enhanced with CPGs compared to the baseline ZSP. BDT outperformed both CoT-FSP and PAGC in automatic evaluation. All of the proposed methods demonstrated high performance in human evaluation. Conclusion LLMs enhanced with CPGs demonstrate superior performance, as compared to plain LLMs with ZSP, in providing accurate recommendations for COVID-19 outpatient treatment, which also highlights the potential for broader applications beyond the case study.
- [1199] arXiv:2401.11140 (cross-list from cs.CV) [ pdf , other ]
-
Title: Stability Plasticity Decoupled Fine-tuning For Few-shot end-to-end Object DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Few-shot object detection(FSOD) aims to design methods to adapt object detectors efficiently with only few annotated samples. Fine-tuning has been shown to be an effective and practical approach. However, previous works often take the classical base-novel two stage fine-tuning procedure but ignore the implicit stability-plasticity contradiction among different modules. Specifically, the random re-initialized classifiers need more plasticity to adapt to novel samples. The other modules inheriting pre-trained weights demand more stability to reserve their class-agnostic knowledge. Regular fine-tuning which couples the optimization of these two parts hurts the model generalization in FSOD scenarios. In this paper, we find that this problem is prominent in the end-to-end object detector Sparse R-CNN for its multi-classifier cascaded architecture. We propose to mitigate this contradiction by a new three-stage fine-tuning procedure by introducing an addtional plasticity classifier fine-tuning(PCF) stage. We further design the multi-source ensemble(ME) technique to enhance the generalization of the model in the final fine-tuning stage. Extensive experiments verify that our method is effective in regularizing Sparse R-CNN, outperforming previous methods in the FSOD benchmark.
- [1200] arXiv:2401.11143 (cross-list from cs.LG) [ pdf , other ]
-
Title: Gaussian Adaptive Attention is All You Need: Robust Contextual Representations Across Multiple ModalitiesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Abstract: We propose the Multi-Head Gaussian Adaptive Attention Mechanism (GAAM), a novel probabilistic attention framework, and the Gaussian Adaptive Transformer (GAT), designed to enhance information aggregation across multiple modalities, including Speech, Text and Vision. GAAM integrates learnable mean and variance into its attention mechanism, implemented in a Multi-Headed framework enabling it to collectively model any Probability Distribution for dynamic recalibration of feature significance. This method demonstrates significant improvements, especially with highly non-stationary data, surpassing the state-of-the-art attention techniques in model performance (up to approximately +20% in accuracy) by identifying key elements within the feature space. GAAM's compatibility with dot-product-based attention models and relatively low number of parameters showcases its adaptability and potential to boost existing attention frameworks. Empirically, GAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in speech, image classification, and text classification, thereby establishing its robustness and versatility in handling multi-modal data. Furthermore, we introduce the Importance Factor (IF), a new learning-based metric that enhances the explainability of models trained with GAAM-based methods. Overall, GAAM represents an advancement towards development of better performing and more explainable attention models across multiple modalities.
- [1201] arXiv:2401.11156 (cross-list from cs.CR) [ pdf , other ]
-
Title: Generalizing Speaker Verification for Spoof Awareness in the Embedding SpaceComments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing (doi updated)Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: It is now well-known that automatic speaker verification (ASV) systems can be spoofed using various types of adversaries. The usual approach to counteract ASV systems against such attacks is to develop a separate spoofing countermeasure (CM) module to classify speech input either as a bonafide, or a spoofed utterance. Nevertheless, such a design requires additional computation and utilization efforts at the authentication stage. An alternative strategy involves a single monolithic ASV system designed to handle both zero-effort imposter (non-targets) and spoofing attacks. Such spoof-aware ASV systems have the potential to provide stronger protections and more economic computations. To this end, we propose to generalize the standalone ASV (G-SASV) against spoofing attacks, where we leverage limited training data from CM to enhance a simple backend in the embedding space, without the involvement of a separate CM module during the test (authentication) phase. We propose a novel yet simple backend classifier based on deep neural networks and conduct the study via domain adaptation and multi-task integration of spoof embeddings at the training stage. Experiments are conducted on the ASVspoof 2019 logical access dataset, where we improve the performance of statistical ASV backends on the joint (bonafide and spoofed) and spoofed conditions by a maximum of 36.2% and 49.8% in terms of equal error rates, respectively.
- [1202] arXiv:2401.11174 (cross-list from cs.CV) [ pdf , other ]
-
Title: Pixel-Wise Recognition for Holistic Surgical Scene UnderstandingAuthors: Nicolás Ayobi , Santiago Rodríguez , Alejandra Pérez , Isabela Hernández , Nicolás Aparicio , Eugénie Dessevres , Sebastián Peña , Jessica Santander , Juan Ignacio Caicedo , Nicolás Fernández , Pablo ArbeláezComments: Preprint submitted to Medical Image Analysis. Official extension of previous MICCAI 2022 ( this https URL ) and ISBI 2023 ( this https URL ) orals. Data and codes are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity. Our approach enables a multi-level comprehension of surgical activities, encompassing long-term tasks such as surgical phases and steps recognition and short-term tasks including surgical instrument segmentation and atomic visual actions detection. To exploit our proposed benchmark, we introduce the Transformers for Actions, Phases, Steps, and Instrument Segmentation (TAPIS) model, a general architecture that combines a global video feature extractor with localized region proposals from an instrument segmentation model to tackle the multi-granularity of our benchmark. Through extensive experimentation, we demonstrate the impact of including segmentation annotations in short-term recognition tasks, highlight the varying granularity requirements of each task, and establish TAPIS's superiority over previously proposed baselines and conventional CNN-based models. Additionally, we validate the robustness of our method across multiple public benchmarks, confirming the reliability and applicability of our dataset. This work represents a significant step forward in Endoscopic Vision, offering a novel and comprehensive framework for future research towards a holistic understanding of surgical procedures.
- [1203] arXiv:2401.11188 (cross-list from cs.LG) [ pdf , other ]
-
Title: Fast and Exact Enumeration of Deep Networks Partitions RegionsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: One fruitful formulation of Deep Networks (DNs) enabling their theoretical study and providing practical guidelines to practitioners relies on Piecewise Affine Splines. In that realm, a DN's input-mapping is expressed as per-region affine mapping where those regions are implicitly determined by the model's architecture and form a partition of their input space. That partition -- which is involved in all the results spanned from this line of research -- has so far only been computed on $2/3$-dimensional slices of the DN's input space or estimated by random sampling. In this paper, we provide the first parallel algorithm that does exact enumeration of the DN's partition regions. The proposed algorithm enables one to finally assess the closeness of the commonly employed approximations methods, e.g. based on random sampling of the DN input space. One of our key finding is that if one is only interested in regions with ``large'' volume, then uniform sampling of the space is highly efficient, but that if one is also interested in discovering the ``small'' regions of the partition, then uniform sampling is exponentially costly with the DN's input space dimension. On the other hand, our proposed method has complexity scaling linearly with input dimension and the number of regions.
- [1204] arXiv:2401.11201 (cross-list from cs.IR) [ pdf , other ]
-
Title: Navigating the Thin Line: Examining User Behavior in Search to Detect Engagement and Backfire EffectsComments: 17 pages, 3 figures, ECIR2024 (46th European Conference on Information Retrieval - IR4Good track)Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Opinionated users often seek information that aligns with their preexisting beliefs while dismissing contradictory evidence due to confirmation bias. This conduct hinders their ability to consider alternative stances when searching the web. Despite this, few studies have analyzed how the diversification of search results on disputed topics influences the search behavior of highly opinionated users. To this end, we present a preregistered user study (n = 257) investigating whether different levels (low and high) of bias metrics and search results presentation (with or without AI-predicted stances labels) can affect the stance diversity consumption and search behavior of opinionated users on three debated topics (i.e., atheism, intellectual property rights, and school uniforms). Our results show that exposing participants to (counter-attitudinally) biased search results increases their consumption of attitude-opposing content, but we also found that bias was associated with a trend toward overall fewer interactions within the search page. We also found that 19% of users interacted with queries and search pages but did not select any search results. When we removed these participants in a post-hoc analysis, we found that stance labels increased the diversity of stances consumed by users, particularly when the search results were biased. Our findings highlight the need for future research to explore distinct search scenario settings to gain insight into opinionated users' behavior.
- [1205] arXiv:2401.11212 (cross-list from cs.DC) [ pdf , other ]
-
Title: Programming Distributed Collective Processes in the eXchange CalculusComments: 32 pages, 13 figuresSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Programming Languages (cs.PL)
Abstract: Recent trends like the Internet of Things (IoT) suggest a vision of dense and multi-scale deployments of computing devices in nearly all kinds of environments. A prominent engineering challenge revolves around programming the collective adaptive behaviour of such computational ecosystems. This requires abstractions able to capture concepts like ensembles (dynamic groups of cooperating devices) and collective tasks (joint activities carried out by ensembles). In this work, we consider collections of devices interacting with neighbours and that execute in nearly-synchronised sense-compute-interact rounds, where the computation is given by a single program mapping sensing values and incoming messages to output and outcoming messages. To support programming whole computational collectives, we propose the abstraction of a distributed collective process, which can be used to define at once the ensemble formation logic and its collective task. We formalise the abstraction in the eXchange Calculus (XC), a core functional language based on neighbouring values (maps from neighbours to values) where state and interaction is handled through a single primitive, exchange, and provide a corresponding implementation in the FCPP language. Then, we exercise distributed collective processes using two case studies: multi-hop message propagation and distributed monitoring of spatial properties. Finally, we discuss the features of the abstraction and its suitability for different kinds of distributed computing applications.
- [1206] arXiv:2401.11217 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Hybrid Approach of Transfer Learning and Physics-Informed Modeling: Improving Dissolved Oxygen Concentration Prediction in an Industrial Wastewater Treatment PlantSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Dynamical Systems (math.DS)
Abstract: Constructing first principles models is a challenging task for nonlinear and complex systems such as a wastewater treatment unit. In recent years, data-driven models are widely used to overcome the complexity. However, they often suffer from issues such as missing, low quality or noisy data. Transfer learning is a solution for this issue where knowledge from another task is transferred to target one to increase the prediction performance. In this work, the objective is increasing the prediction performance of an industrial wastewater treatment plant by transferring the knowledge of (i) an open-source simulation model that captures the underlying physics of the process, albeit with dissimilarities to the target plant, (ii) another industrial plant characterized by noisy and limited data but located in the same refinery, and (iii) the model in (ii) and making the objective function of the training problem physics informed where the physics information derived from the open-source model in (ii). The results have shown that test and validation performance are improved up to 27% and 59%, respectively.
- [1207] arXiv:2401.11235 (cross-list from cs.LG) [ pdf , other ]
-
Title: TreeMIL: A Multi-instance Learning Framework for Time Series Anomaly Detection with Inexact SupervisionComments: This paper has been accepted by IEEE ICASSP 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Time series anomaly detection (TSAD) plays a vital role in various domains such as healthcare, networks, and industry. Considering labels are crucial for detection but difficult to obtain, we turn to TSAD with inexact supervision: only series-level labels are provided during the training phase, while point-level anomalies are predicted during the testing phase. Previous works follow a traditional multi-instance learning (MIL) approach, which focuses on encouraging high anomaly scores at individual time steps. However, time series anomalies are not only limited to individual point anomalies, they can also be collective anomalies, typically exhibiting abnormal patterns over subsequences. To address the challenge of collective anomalies, in this paper, we propose a tree-based MIL framework (TreeMIL). We first adopt an N-ary tree structure to divide the entire series into multiple nodes, where nodes at different levels represent subsequences with different lengths. Then, the subsequence features are extracted to determine the presence of collective anomalies. Finally, we calculate point-level anomaly scores by aggregating features from nodes at different levels. Experiments conducted on seven public datasets and eight baselines demonstrate that TreeMIL achieves an average 32.3% improvement in F1- score compared to previous state-of-the-art methods. The code is available at this https URL .
- [1208] arXiv:2401.11249 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Evaluating if trust and personal information privacy concerns are barriers to using health insurance that explicitly utilizes AIComments: Journal of Internet Commerce (2021)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Trust and privacy have emerged as significant concerns in online transactions. Sharing information on health is especially sensitive but it is necessary for purchasing and utilizing health insurance. Evidence shows that consumers are increasingly comfortable with technology in place of humans, but the expanding use of AI potentially changes this. This research explores whether trust and privacy concern are barriers to the adoption of AI in health insurance. Two scenarios are compared: The first scenario has limited AI that is not in the interface and its presence is not explicitly revealed to the consumer. In the second scenario there is an AI interface and AI evaluation, and this is explicitly revealed to the consumer. The two scenarios were modeled and compared using SEM PLS-MGA. The findings show that trust is significantly lower in the second scenario where AI is visible. Privacy concerns are higher with AI but the difference is not statistically significant within the model.
- [1209] arXiv:2401.11252 (cross-list from cs.LG) [ pdf , other ]
-
Title: Automated Fusion of Multimodal Electronic Health Records for Better Medical PredictionsComments: Accepted by SDM 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The widespread adoption of Electronic Health Record (EHR) systems in healthcare institutes has generated vast amounts of medical data, offering significant opportunities for improving healthcare services through deep learning techniques. However, the complex and diverse modalities and feature structures in real-world EHR data pose great challenges for deep learning model design. To address the multi-modality challenge in EHR data, current approaches primarily rely on hand-crafted model architectures based on intuition and empirical experiences, leading to sub-optimal model architectures and limited performance. Therefore, to automate the process of model design for mining EHR data, we propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies. We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework not only achieves significant performance improvement over existing state-of-the-art methods but also discovers meaningful network architectures effectively.
- [1210] arXiv:2401.11257 (cross-list from cs.MA) [ pdf , other ]
-
Title: Measuring Policy Distance for Multi-Agent Reinforcement LearningComments: 9 pages, 6 figuresSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.
- [1211] arXiv:2401.11284 (cross-list from cs.CV) [ pdf , other ]
-
Title: Evaluating Driver Readiness in Conditionally Automated Vehicles from Eye-Tracking Data and Head PoseSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: As automated driving technology advances, the role of the driver to resume control of the vehicle in conditionally automated vehicles becomes increasingly critical. In the SAE Level 3 or partly automated vehicles, the driver needs to be available and ready to intervene when necessary. This makes it essential to evaluate their readiness accurately. This article presents a comprehensive analysis of driver readiness assessment by combining head pose features and eye-tracking data. The study explores the effectiveness of predictive models in evaluating driver readiness, addressing the challenges of dataset limitations and limited ground truth labels. Machine learning techniques, including LSTM architectures, are utilised to model driver readiness based on the Spatio-temporal status of the driver's head pose and eye gaze. The experiments in this article revealed that a Bidirectional LSTM architecture, combining both feature sets, achieves a mean absolute error of 0.363 on the DMD dataset, demonstrating superior performance in assessing driver readiness. The modular architecture of the proposed model also allows the integration of additional driver-specific features, such as steering wheel activity, enhancing its adaptability and real-world applicability.
- [1212] arXiv:2401.11314 (cross-list from cs.HC) [ pdf , other ]
-
Title: CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator NeedsAuthors: Majeed Kazemitabaar , Runlong Ye , Xiaoning Wang , Austin Z. Henley , Paul Denny , Michelle Craig , Tovi GrossmanComments: CHI 2024 Paper - The paper includes 17 pages, 8 figures, 2 tables, along with a 2-page appendixSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Timely, personalized feedback is essential for students learning programming. LLM-powered tools like ChatGPT offer instant support, but reveal direct answers with code, which may hinder deep conceptual engagement. We developed CodeAid, an LLM-powered programming assistant delivering helpful, technically correct responses, without revealing code solutions. CodeAid answers conceptual questions, generates pseudo-code with line-by-line explanations, and annotates student's incorrect code with fix suggestions. We deployed CodeAid in a programming class of 700 students for a 12-week semester. A thematic analysis of 8,000 usages of CodeAid was performed, further enriched by weekly surveys, and 22 student interviews. We then interviewed eight programming educators to gain further insights. Our findings reveal four design considerations for future educational AI assistants: D1) exploiting AI's unique benefits; D2) simplifying query formulation while promoting cognitive engagement; D3) avoiding direct responses while encouraging motivated learning; and D4) maintaining transparency and control for students to asses and steer AI responses.
- [1213] arXiv:2401.11316 (cross-list from cs.CL) [ pdf , other ]
-
Title: PRILoRA: Pruned and Rank-Increasing Low-Rank AdaptationComments: EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: With the proliferation of large pre-trained language models (PLMs), fine-tuning all model parameters becomes increasingly inefficient, particularly when dealing with numerous downstream tasks that entail substantial training and storage costs. Several approaches aimed at achieving parameter-efficient fine-tuning (PEFT) have been proposed. Among them, Low-Rank Adaptation (LoRA) stands out as an archetypal method, incorporating trainable rank decomposition matrices into each target module. Nevertheless, LoRA does not consider the varying importance of each layer. To address these challenges, we introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process, considering both the temporary magnitude of weights and the accumulated statistics of the input to any given layer. We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
- [1214] arXiv:2401.11325 (cross-list from cs.LG) [ pdf , other ]
-
Title: Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to MarkovSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Many Reinforcement Learning algorithms assume a Markov reward function to guarantee optimality. However, not all reward functions are known to be Markov. In this paper, we propose a framework for mapping non-Markov reward functions into equivalent Markov ones by learning a Reward Machine - a specialized reward automaton. Unlike the general practice of learning Reward Machines, we do not require a set of high-level propositional symbols from which to learn. Rather, we learn \emph{hidden triggers} directly from data that encode them. We demonstrate the importance of learning Reward Machines versus their Deterministic Finite-State Automata counterparts, for this task, given their ability to model reward dependencies in a single automaton. We formalize this distinction in our learning objective. Our mapping process is constructed as an Integer Linear Programming problem. We prove that our mappings provide consistent expectations for the underlying process. We empirically validate our approach by learning black-box non-Markov Reward functions in the Officeworld Domain. Additionally, we demonstrate the effectiveness of learning dependencies between rewards in a new domain, Breakfastworld.
- [1215] arXiv:2401.11337 (cross-list from cs.CV) [ pdf , other ]
-
Title: Prompting Large Vision-Language Models for Compositional ReasoningSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Vision-language models such as CLIP have shown impressive capabilities in encoding texts and images into aligned embeddings, enabling the retrieval of multimodal data in a shared embedding space. However, these embedding-based models still face challenges in effectively matching images and texts with similar visio-linguistic compositionality, as evidenced by their performance on the recent Winoground dataset. In this paper, we argue that this limitation stems from two factors: the use of single vector representations for complex multimodal data, and the absence of step-by-step reasoning in these embedding-based methods. To address this issue, we make an exploratory step using a novel generative method that prompts large vision-language models (e.g., GPT-4) to depict images and perform compositional reasoning. Our method outperforms other embedding-based methods on the Winoground dataset, and obtains further improvement of up to 10% accuracy when enhanced with the optimal description.
- [1216] arXiv:2401.11360 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: PepHarmony: A Multi-View Contrastive Learning Framework for Integrated Sequence and Structure-Based Peptide EncodingAuthors: Ruochi Zhang , Haoran Wu , Chang Liu , Huaping Li , Yuqian Wu , Kewei Li , Yifan Wang , Yifan Deng , Jiahui Chen , Fengfeng Zhou , Xin GaoComments: 25 pages, 5 figures, 3 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Biomolecules (q-bio.BM)
Abstract: Recent advances in protein language models have catalyzed significant progress in peptide sequence representation. Despite extensive exploration in this field, pre-trained models tailored for peptide-specific needs remain largely unaddressed due to the difficulty in capturing the complex and sometimes unstable structures of peptides. This study introduces a novel multi-view contrastive learning framework PepHarmony for the sequence-based peptide encoding task. PepHarmony innovatively combines both sequence- and structure-level information into a sequence-level encoding module through contrastive learning. We carefully select datasets from the Protein Data Bank (PDB) and AlphaFold database to encompass a broad spectrum of peptide sequences and structures. The experimental data highlights PepHarmony's exceptional capability in capturing the intricate relationship between peptide sequences and structures compared with the baseline and fine-tuned models. The robustness of our model is confirmed through extensive ablation studies, which emphasize the crucial roles of contrastive loss and strategic data sorting in enhancing predictive performance. The proposed PepHarmony framework serves as a notable contribution to peptide representations, and offers valuable insights for future applications in peptide drug discovery and peptide engineering. We have made all the source code utilized in this study publicly accessible via GitHub at this https URL or this http URL .
- [1217] arXiv:2401.11361 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Revolutionizing API Documentation through SummarizationComments: arXiv admin note: text overlap with arXiv:2308.09070Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This study tackles the challenges associated with interpreting Application Programming Interface (API) documentation, an integral aspect of software development. Official API documentation, while essential, can be lengthy and challenging to navigate, prompting developers to seek unofficial sources such as Stack Overflow. Leveraging the vast user-generated content on Stack Overflow, including code snippets and discussions, we employ BERTopic and extractive summarization to automatically generate concise and informative API summaries. These summaries encompass key insights like general usage, common developer issues, and potential solutions, sourced from the wealth of knowledge on Stack Overflow. Software developers evaluate these summaries for performance, coherence, and interoperability, providing valuable feedback on the practicality of our approach.
- [1218] arXiv:2401.11370 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Self-sustaining Software Systems (S4): Towards Improved Interpretability and AdaptationComments: Accepted at The 1st International Workshop New Trends in Software Architecture (SATrends) 2024Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: Software systems impact society at different levels as they pervasively solve real-world problems. Modern software systems are often so sophisticated that their complexity exceeds the limits of human comprehension. These systems must respond to changing goals, dynamic data, unexpected failures, and security threats, among other variable factors in real-world environments. Systems' complexity challenges their interpretability and requires autonomous responses to dynamic changes. Two main research areas explore autonomous systems' responses: evolutionary computing and autonomic computing. Evolutionary computing focuses on software improvement based on iterative modifications to the source code. Autonomic computing focuses on optimising systems' performance by changing their structure, behaviour, or environment variables. Approaches from both areas rely on feedback loops that accumulate knowledge from the system interactions to inform autonomous decision-making. However, this knowledge is often limited, constraining the systems' interpretability and adaptability. This paper proposes a new concept for interpretable and adaptable software systems: self-sustaining software systems (S4). S4 builds knowledge loops between all available knowledge sources that define modern software systems to improve their interpretability and adaptability. This paper introduces and discusses the S4 concept.
- [1219] arXiv:2401.11374 (cross-list from cs.CL) [ pdf , other ]
-
Title: Language Models as Hierarchy EncodersSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Interpreting hierarchical structures latent in language is a key limitation of current language models (LMs). While previous research has implicitly leveraged these hierarchies to enhance LMs, approaches for their explicit encoding are yet to be explored. To address this, we introduce a novel approach to re-train transformer encoder-based LMs as Hierarchy Transformer encoders (HiTs), harnessing the expansive nature of hyperbolic space. Our method situates the output embedding space of pre-trained LMs within a Poincaré ball with a curvature that adapts to the embedding dimension, followed by re-training on hyperbolic cluster and centripetal losses. These losses are designed to effectively cluster related entities (input as texts) and organise them hierarchically. We evaluate HiTs against pre-trained and fine-tuned LMs, focusing on their capabilities in simulating transitive inference, predicting subsumptions, and transferring knowledge across hierarchies. The results demonstrate that HiTs consistently outperform both pre-trained and fine-tuned LMs in these tasks, underscoring the effectiveness and transferability of our re-trained hierarchy encoders.
- [1220] arXiv:2401.11382 (cross-list from cs.CL) [ pdf , other ]
-
Title: Using Large Language Model for End-to-End Chinese ASR and NERAuthors: Yuang Li , Jiawei Yu , Yanqing Zhao , Min Zhang , Mengxin Ren , Xiaofeng Zhao , Xiaosong Qiao , Chang Su , Miaomiao Ma , Hao YangComments: 5 pages, 2 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Mapping speech tokens to the same feature space as text tokens has become the paradigm for the integration of speech modality into decoder-only large language models (LLMs). An alternative approach is to use an encoder-decoder architecture that incorporates speech features through cross-attention. This approach, however, has received less attention in the literature. In this work, we connect the Whisper encoder with ChatGLM3 and provide in-depth comparisons of these two approaches using Chinese automatic speech recognition (ASR) and name entity recognition (NER) tasks. We evaluate them not only by conventional metrics like the F1 score but also by a novel fine-grained taxonomy of ASR-NER errors. Our experiments reveal that encoder-decoder architecture outperforms decoder-only architecture with a short context, while decoder-only architecture benefits from a long context as it fully exploits all layers of the LLM. By using LLM, we significantly reduced the entity omission errors and improved the entity ASR accuracy compared to the Conformer baseline. Additionally, we obtained a state-of-the-art (SOTA) F1 score of 0.805 on the AISHELL-NER test set by using chain-of-thought (CoT) NER which first infers long-form ASR transcriptions and then predicts NER labels.
- [1221] arXiv:2401.11389 (cross-list from cs.CL) [ pdf , other ]
-
Title: MedLM: Exploring Language Models for Medical Question Answering SystemsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In the face of rapidly expanding online medical literature, automated systems for aggregating and summarizing information are becoming increasingly crucial for healthcare professionals and patients. Large Language Models (LLMs), with their advanced generative capabilities, have shown promise in various NLP tasks, and their potential in the healthcare domain, particularly for Closed-Book Generative QnA, is significant. However, the performance of these models in domain-specific tasks such as medical Q&A remains largely unexplored. This study aims to fill this gap by comparing the performance of general and medical-specific distilled LMs for medical Q&A. We aim to evaluate the effectiveness of fine-tuning domain-specific LMs and compare the performance of different families of Language Models. The study will address critical questions about these models' reliability, comparative performance, and effectiveness in the context of medical Q&A. The findings will provide valuable insights into the suitability of different LMs for specific applications in the medical domain.
- [1222] arXiv:2401.11408 (cross-list from cs.CL) [ pdf , other ]
-
Title: SEBERTNets: Sequence Enhanced BERT Networks for Event Entity Extraction Tasks Oriented to the Finance FieldComments: CCKS 2019Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Event extraction lies at the cores of investment analysis and asset management in the financial field, and thus has received much attention. The 2019 China conference on knowledge graph and semantic computing (CCKS) challenge sets up a evaluation competition for event entity extraction task oriented to the finance field. In this task, we mainly focus on how to extract the event entity accurately, and recall all the corresponding event entity effectively. In this paper, we propose a novel model, Sequence Enhanced BERT Networks (SEBERTNets for short), which can inherit the advantages of the BERT,and while capturing sequence semantic information. In addition, motivated by recommendation system, we propose Hybrid Sequence Enhanced BERT Networks (HSEBERTNets for short), which uses a multi-channel recall method to recall all the corresponding event entity. The experimental results show that, the F1 score of SEBERTNets is 0.905 in the first stage, and the F1 score of HSEBERTNets is 0.934 in the first stage, which demonstarate the effectiveness of our methods.
- [1223] arXiv:2401.11410 (cross-list from cs.LG) [ pdf , other ]
-
Title: Agricultural Recommendation System based on Deep Learning: A Multivariate Weather Forecasting ApproachAuthors: Md Zubair (1), Md. Shahidul Salim (2), Mehrab Mustafy Rahman (3), Mohammad Jahid Ibna Basher (1), Shahin Imran (4), Iqbal H. Sarker (5) ((1) Chittagong University of Engineering & Technology, Chittagong, Bangladesh, (2) Khulna University of Engineering & Technology, Khulna, Bangladesh, (3) Islamic University of Technology, Gazipur, Bangladesh, (4) Khulna Agricultural University, Khulna, Bangladesh, (5) Edith Cowan University, Perth, Australia)Comments: 18 pages, 16 figures and 13 tables. Two figures and one table have been added to this versionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Bangladesh is predominantly an agricultural country, where the agrarian sector plays an essential role in accelerating economic growth and enabling the food security of the people. The performance of this sector has an overwhelming impact on the primary macroeconomic objectives like food security, employment generation, poverty alleviation, human resources development, and other economic and social forces. Although Bangladesh's labor-intensive agriculture has achieved steady increases in food grain production, it often suffered from unfavorable weather conditions such as heavy rainfall, low temperature, and drought. Consequently, these factors hinder the production of food substantially, putting the country's overall food security in danger. In order to have a profitable, sustainable, and farmer-friendly agricultural practice, this paper proposes a context-based crop recommendation system powered by a weather forecast model. With extensive evaluation, the multivariate Stacked Bi-LSTM (three Bi-LSTM layers with a time Distributed layer) Network is employed as the weather forecasting model. The proposed weather model can forecast Rainfall, Temperature, Humidity, and Sunshine for any given location in Bangladesh with an average R-squared value of 0.9824, and the model outperforms other state-of-the-art LSTM models. These predictions guide our system in generating viable farming decisions. Additionally, our full-fledged system is capable of alerting the farmers about extreme weather conditions so that preventive measures can be undertaken to protect the crops. Finally, the system is also adept at making knowledge-based crop suggestions for the flood and drought-prone regions of Bangladesh.
- [1224] arXiv:2401.11414 (cross-list from cs.CV) [ pdf , other ]
-
Title: S$^3$M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous DrivingComments: accepted to IEEE Trans. on Intelligent Vehicles (T-IV)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: Semantic segmentation and stereo matching are two essential components of 3D environmental perception systems for autonomous driving. Nevertheless, conventional approaches often address these two problems independently, employing separate models for each task. This approach poses practical limitations in real-world scenarios, particularly when computational resources are scarce or real-time performance is imperative. Hence, in this article, we introduce S$^3$M-Net, a novel joint learning framework developed to perform semantic segmentation and stereo matching simultaneously. Specifically, S$^3$M-Net shares the features extracted from RGB images between both tasks, resulting in an improved overall scene understanding capability. This feature sharing process is realized using a feature fusion adaption (FFA) module, which effectively transforms the shared features into semantic space and subsequently fuses them with the encoded disparity features. The entire joint learning framework is trained by minimizing a novel semantic consistency-guided (SCG) loss, which places emphasis on the structural consistency in both tasks. Extensive experimental results conducted on the vKITTI2 and KITTI datasets demonstrate the effectiveness of our proposed joint learning framework and its superior performance compared to other state-of-the-art single-task networks. Our project webpage is accessible at mias.group/S3M-Net.
- [1225] arXiv:2401.11418 (cross-list from cs.LG) [ pdf , other ]
-
Title: Double-Bounded Optimal Transport for Advanced Clustering and ClassificationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: Optimal transport (OT) is attracting increasing attention in machine learning. It aims to transport a source distribution to a target one at minimal cost. In its vanilla form, the source and target distributions are predetermined, which contracts to the real-world case involving undetermined targets. In this paper, we propose Doubly Bounded Optimal Transport (DB-OT), which assumes that the target distribution is restricted within two boundaries instead of a fixed one, thus giving more freedom for the transport to find solutions. Based on the entropic regularization of DB-OT, three scaling-based algorithms are devised for calculating the optimal solution. We also show that our DB-OT is helpful for barycenter-based clustering, which can avoid the excessive concentration of samples in a single cluster. Then we further develop DB-OT techniques for long-tailed classification which is an emerging and open problem. We first propose a connection between OT and classification, that is, in the classification task, training involves optimizing the Inverse OT to learn the representations, while testing involves optimizing the OT for predictions. With this OT perspective, we first apply DB-OT to improve the loss, and the Balanced Softmax is shown as a special case. Then we apply DB-OT for inference in the testing process. Even with vanilla Softmax trained features, our extensive experimental results show that our method can achieve good results with our improved inference scheme in the testing stage.
- [1226] arXiv:2401.11439 (cross-list from cs.RO) [ pdf , other ]
-
Title: General Flow as Foundation Affordance for Scalable Robot LearningSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: We address the challenge of acquiring real-world manipulation skills with a scalable framework.Inspired by the success of large-scale auto-regressive prediction in Large Language Models (LLMs), we hold the belief that identifying an appropriate prediction target capable of leveraging large-scale datasets is crucial for achieving efficient and universal learning. Therefore, we propose to utilize flow, which represents the future trajectories of 3D points on objects of interest, as an ideal prediction target in robot learning. To exploit scalable data resources, we turn our attention to cross-embodiment datasets. We develop, for the first time, a language-conditioned prediction model directly from large-scale RGBD human video datasets. Our predicted flow offers actionable geometric and physics guidance, thus facilitating stable zero-shot skill transfer in real-world scenarios.We deploy our method with a policy based on closed-loop flow prediction. Remarkably, without any additional training, our method achieves an impressive 81% success rate in human-to-robot skill transfer, covering 18 tasks in 6 scenes. Our framework features the following benefits: (1) scalability: leveraging cross-embodiment data resources; (2) universality: multiple object categories, including rigid, articulated, and soft bodies; (3) stable skill transfer: providing actionable guidance with a small inference domain-gap. These lead to a new pathway towards scalable general robot learning. Data, code, and model weights will be made publicly available.
- [1227] arXiv:2401.11459 (cross-list from cs.AR) [ pdf , other ]
-
Title: AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory TechnologyAuthors: Rongqing Cong , Wenyang He , Mingxuan Li , Bangning Luo , Zebin Yang , Yuchao Yang , Ru Huang , Bonan YanComments: for associated source codes, see this https URLSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) with Transformer architectures have become phenomenal in natural language processing, multimodal generative artificial intelligence, and agent-oriented artificial intelligence. The self-attention module is the most dominating sub-structure inside Transformer-based LLMs. Computation using general-purpose graphics processing units (GPUs) inflicts reckless demand for I/O bandwidth for transferring intermediate calculation results between memories and processing units. To tackle this challenge, this work develops a fully customized vanilla self-attention accelerator, AttentionLego, as the basic building block for constructing spatially expandable LLM processors. AttentionLego provides basic implementation with fully-customized digital logic incorporating Processing-In-Memory (PIM) technology. It is based on PIM-based matrix-vector multiplication and look-up table-based Softmax design. The open-source code is available online: this https URL .
- [1228] arXiv:2401.11471 (cross-list from cs.DC) [ pdf , other ]
-
Title: LR-CNN: Lightweight Row-centric Convolutional Neural Network Training for Memory ReductionAuthors: Zhigang Wang , Hangyu Yang , Ning Wang , Chuanfei Xu , Jie Nie , Zhiqiang Wei , Yu Gu , Ge YuSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: In the last decade, Convolutional Neural Network with a multi-layer architecture has advanced rapidly. However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially when processing high-dimension inputs with a big batch size. That poses great challenges to the limited memory capacity of current accelerators (e.g., GPUs). Existing efforts mitigate such bottleneck by external auxiliary solutions with additional hardware costs, and internal modifications with potential accuracy penalty. Differently, our analysis reveals that computations intra- and inter-layers exhibit the spatial-temporal weak dependency and even complete independency features. That inspires us to break the traditional layer-by-layer (column) dataflow rule. Now operations are novelly re-organized into rows throughout all convolution layers. This lightweight design allows a majority of intermediate data to be removed without any loss of accuracy. We particularly study the weak dependency between two consecutive rows. For the resulting skewed memory consumption, we give two solutions with different favorite scenarios. Evaluations on two representative networks confirm the effectiveness. We also validate that our middle dataflow optimization can be smoothly embraced by existing works for better memory reduction.
- [1229] arXiv:2401.11489 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: MapChange: Enhancing Semantic Change Detection with Temporal-Invariant Historical Maps Based on Deep Triplet NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Semantic Change Detection (SCD) is recognized as both a crucial and challenging task in the field of image analysis. Traditional methods for SCD have predominantly relied on the comparison of image pairs. However, this approach is significantly hindered by substantial imaging differences, which arise due to variations in shooting times, atmospheric conditions, and angles. Such discrepancies lead to two primary issues: the under-detection of minor yet significant changes, and the generation of false alarms due to temporal variances. These factors often result in unchanged objects appearing markedly different in multi-temporal images. In response to these challenges, the MapChange framework has been developed. This framework introduces a novel paradigm that synergizes temporal-invariant historical map data with contemporary high-resolution images. By employing this combination, the temporal variance inherent in conventional image pair comparisons is effectively mitigated. The efficacy of the MapChange framework has been empirically validated through comprehensive testing on two public datasets. These tests have demonstrated the framework's marked superiority over existing state-of-the-art SCD methods.
- [1230] arXiv:2401.11500 (cross-list from cs.RO) [ pdf , other ]
-
Title: Integration of Large Language Models in Control of EHD Pumps for Precise Color SynthesisSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: This paper presents an innovative approach to integrating Large Language Models (LLMs) with Arduino-controlled Electrohydrodynamic (EHD) pumps for precise color synthesis in automation systems. We propose a novel framework that employs fine-tuned LLMs to interpret natural language commands and convert them into specific operational instructions for EHD pump control. This approach aims to enhance user interaction with complex hardware systems, making it more intuitive and efficient. The methodology involves four key steps: fine-tuning the language model with a dataset of color specifications and corresponding Arduino code, developing a natural language processing interface, translating user inputs into executable Arduino code, and controlling EHD pumps for accurate color mixing. Conceptual experiment results, based on theoretical assumptions, indicate a high potential for accurate color synthesis, efficient language model interpretation, and reliable EHD pump operation. This research extends the application of LLMs beyond text-based tasks, demonstrating their potential in industrial automation and control systems. While highlighting the limitations and the need for real-world testing, this study opens new avenues for AI applications in physical system control and sets a foundation for future advancements in AI-driven automation technologies.
- [1231] arXiv:2401.11504 (cross-list from cs.CL) [ pdf , other ]
-
Title: With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Long text generation, such as novel writing and discourse-level translation with extremely long contexts, presents significant challenges to current language models. Existing methods mainly focus on extending the model's context window through strategies like length extrapolation. However, these approaches demand substantial hardware resources during the training and/or inference phases. Our proposed method, Temp-Lora, introduces an alternative concept. Instead of relying on the KV cache to store all context information, we embeds this information directly into a temporary Lora module. In the process of long text generation, this module is progressively trained with text generated previously. This approach not only efficiently preserves contextual knowledge but also prevents any permanent alteration to the model's parameters given that the module is discarded post-generation. Extensive experiments on the PG19 language modeling benchmark and the GuoFeng discourse-level translation benchmark validate the effectiveness of Temp-Lora. Our results show that: 1) Temp-Lora substantially enhances generation quality for long text, as indicated by a 13.2% decrease in perplexity (PPL) on a subset of PG19, and a 29.3% decrease in PPL along with a 113.2% increase in BLEU score on a subset of GuoFeng, 2) Temp-Lora is compatible with and enhances most existing long text generation methods, and 3) Temp-Lora can greatly reduce computational costs by shortening the context window. For example, we can ensure a moderate improvement in generation quality (a decrease of 3.8% in PPL) while enabling a 51.5% memory usage reduction and a 60.0% decrease in latency for inference.
- [1232] arXiv:2401.11512 (cross-list from cs.LG) [ pdf , other ]
-
Title: Information-Theoretic State Variable Selection for Reinforcement LearningComments: 47 pages, 12 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: Identifying the most suitable variables to represent the state is a fundamental challenge in Reinforcement Learning (RL). These variables must efficiently capture the information necessary for making optimal decisions. In order to address this problem, in this paper, we introduce the Transfer Entropy Redundancy Criterion (TERC), an information-theoretic criterion, which determines if there is \textit{entropy transferred} from state variables to actions during training. We define an algorithm based on TERC that provably excludes variables from the state that have no effect on the final performance of the agent, resulting in more sample efficient learning. Experimental results show that this speed-up is present across three different algorithm classes (represented by tabular Q-learning, Actor-Critic, and Proximal Policy Optimization (PPO)) in a variety of environments. Furthermore, to highlight the differences between the proposed methodology and the current state-of-the-art feature selection approaches, we present a series of controlled experiments on synthetic data, before generalizing to real-world decision-making tasks. We also introduce a representation of the problem that compactly captures the transfer of information from state variables to actions as Bayesian networks.
- [1233] arXiv:2401.11596 (cross-list from cs.GT) [ pdf , other ]
-
Title: Learning to Maximize Gains From Trade in Small MarketsSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We study the problem of designing a two-sided market (double auction) to maximize the gains from trade (social welfare) under the constraints of (dominant-strategy) incentive compatibility and budget-balance. Our goal is to do so for an unknown distribution from which we are given a polynomial number of samples. Our first result is a general impossibility for the case of correlated distributions of values even between just one seller and two buyers, in contrast to the case of one seller and one buyer (bilateral trade) where this is possible. Our second result is an efficient learning algorithm for one seller and two buyers in the case of independent distributions which is based on a novel algorithm for computing optimal mechanisms for finitely supported and explicitly given independent distributions. Both results rely heavily on characterizations of (dominant-strategy) incentive compatible mechanisms that are strongly budget-balanced.
- [1234] arXiv:2401.11605 (cross-list from cs.CV) [ pdf , other ]
-
Title: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion TransformersAuthors: Katherine Crowson , Stefan Andreas Baumann , Alex Birch , Tanishq Mathew Abraham , Daniel Z. Kaplan , Enrico ShippoleComments: 20 pages, 13 figures, project page and code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We present the Hourglass Diffusion Transformer (HDiT), an image generative model that exhibits linear scaling with pixel count, supporting training at high-resolution (e.g. $1024 \times 1024$) directly in pixel-space. Building on the Transformer architecture, which is known to scale to billions of parameters, it bridges the gap between the efficiency of convolutional U-Nets and the scalability of Transformers. HDiT trains successfully without typical high-resolution training techniques such as multiscale architectures, latent autoencoders or self-conditioning. We demonstrate that HDiT performs competitively with existing models on ImageNet $256^2$, and sets a new state-of-the-art for diffusion models on FFHQ-$1024^2$.
- [1235] arXiv:2401.11609 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Edits for Counterfactual Explanations: A comparative studySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Counterfactuals have been established as a popular explainability technique which leverages a set of minimal edits to alter the prediction of a classifier. When considering conceptual counterfactuals on images, the edits requested should correspond to salient concepts present in the input data. At the same time, conceptual distances are defined by knowledge graphs, ensuring the optimality of conceptual edits. In this work, we extend previous endeavors on graph edits as counterfactual explanations by conducting a comparative study which encompasses both supervised and unsupervised Graph Neural Network (GNN) approaches. To this end, we pose the following significant research question: should we represent input data as graphs, which is the optimal GNN approach in terms of performance and time efficiency to generate minimal and meaningful counterfactual explanations for black-box image classifiers?
- [1236] arXiv:2401.11618 (cross-list from cs.LG) [ pdf , other ]
-
Title: Efficient local linearity regularization to overcome catastrophic overfittingComments: Accepted in ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Abstract: Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly with respect to the input, this is however lost in single-step AT. To address CO in single-step AT, several methods have been proposed to enforce local linearity of the loss via regularization. However, these regularization terms considerably slow down training due to Double Backpropagation. Instead, in this work, we introduce a regularization term, called ELLE, to mitigate CO effectively and efficiently in classical AT evaluations, as well as some more difficult regimes, e.g., large adversarial perturbations and long training schedules. Our regularization term can be theoretically linked to curvature of the loss function and is computationally cheaper than previous methods by avoiding Double Backpropagation. Our thorough experimental validation demonstrates that our work does not suffer from CO, even in challenging settings where previous works suffer from it. We also notice that adapting our regularization parameter during training (ELLE-A) greatly improves the performance, specially in large $\epsilon$ setups. Our implementation is available in this https URL .
- [1237] arXiv:2401.11624 (cross-list from cs.CL) [ pdf , other ]
-
Title: In-context Learning with Retrieved Demonstrations for Language Models: A SurveySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Language models, especially pre-trained large language models, have showcased remarkable abilities as few-shot in-context learners (ICL), adept at adapting to new tasks with just a few demonstrations in the input context. However, the model's ability to perform ICL is sensitive to the choice of the few-shot demonstrations. Instead of using a fixed set of demonstrations, one recent development is to retrieve demonstrations tailored to each input query. The implementation of demonstration retrieval is relatively straightforward, leveraging existing databases and retrieval systems. This not only improves the efficiency and scalability of the learning process but also has been shown to reduce biases inherent in manual example selection. In light of the encouraging results and growing research in ICL with retrieved demonstrations, we conduct an extensive review of studies in this area. In this survey, we discuss and compare different design choices for retrieval models, retrieval training procedures, and inference algorithms.
- [1238] arXiv:2401.11627 (cross-list from cs.LG) [ pdf , other ]
-
Title: Tight Verification of Probabilistic Robustness in Bayesian Neural NetworksComments: Accepted at AISTATS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Abstract: We introduce two algorithms for computing tight guarantees on the probabilistic robustness of Bayesian Neural Networks (BNNs). Computing robustness guarantees for BNNs is a significantly more challenging task than verifying the robustness of standard Neural Networks (NNs) because it requires searching the parameters' space for safe weights. Moreover, tight and complete approaches for the verification of standard NNs, such as those based on Mixed-Integer Linear Programming (MILP), cannot be directly used for the verification of BNNs because of the polynomial terms resulting from the consecutive multiplication of variables encoding the weights. Our algorithms efficiently and effectively search the parameters' space for safe weights by using iterative expansion and the network's gradient and can be used with any verification algorithm of choice for BNNs. In addition to proving that our algorithms compute tighter bounds than the SoA, we also evaluate our algorithms against the SoA on standard benchmarks, such as MNIST and CIFAR10, showing that our algorithms compute bounds up to 40% tighter than the SoA.
- [1239] arXiv:2401.11633 (cross-list from cs.CV) [ pdf , other ]
-
Title: Zoom-shot: Fast and Efficient Unsupervised Zero-Shot Transfer of CLIP to Vision Encoders with Multimodal LossComments: 15 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The fusion of vision and language has brought about a transformative shift in computer vision through the emergence of Vision-Language Models (VLMs). However, the resource-intensive nature of existing VLMs poses a significant challenge. We need an accessible method for developing the next generation of VLMs. To address this issue, we propose Zoom-shot, a novel method for transferring the zero-shot capabilities of CLIP to any pre-trained vision encoder. We do this by exploiting the multimodal information (i.e. text and image) present in the CLIP latent space through the use of specifically designed multimodal loss functions. These loss functions are (1) cycle-consistency loss and (2) our novel prompt-guided knowledge distillation loss (PG-KD). PG-KD combines the concept of knowledge distillation with CLIP's zero-shot classification, to capture the interactions between text and image features. With our multimodal losses, we train a $\textbf{linear mapping}$ between the CLIP latent space and the latent space of a pre-trained vision encoder, for only a $\textbf{single epoch}$. Furthermore, Zoom-shot is entirely unsupervised and is trained using $\textbf{unpaired}$ data. We test the zero-shot capabilities of a range of vision encoders augmented as new VLMs, on coarse and fine-grained classification datasets, outperforming the previous state-of-the-art in this problem domain. In our ablations, we find Zoom-shot allows for a trade-off between data and compute during training; and our state-of-the-art results can be obtained by reducing training from 20% to 1% of the ImageNet training data with 20 epochs. All code and models are available on GitHub.
- [1240] arXiv:2401.11647 (cross-list from cs.LG) [ pdf , other ]
-
Title: LW-FedSSL: Resource-efficient Layer-wise Federated Self-supervised LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Many recent studies integrate federated learning (FL) with self-supervised learning (SSL) to take advantage of raw training data distributed across edge devices. However, edge devices often struggle with high computation and communication costs imposed by SSL and FL algorithms. To tackle this hindrance, we propose LW-FedSSL, a layer-wise federated self-supervised learning approach that allows edge devices to incrementally train one layer of the model at a time. LW-FedSSL comprises server-side calibration and representation alignment mechanisms to maintain comparable performance with end-to-end FedSSL while significantly lowering clients' resource requirements. The server-side calibration mechanism takes advantage of the resource-rich server in an FL environment to assist in global model training. Meanwhile, the representation alignment mechanism encourages closeness between representations of FL local models and those of the global model. Our experiments show that LW-FedSSL has a $3.3 \times$ lower memory requirement and a $3.2 \times$ cheaper communication cost than its end-to-end counterpart. We also explore a progressive training strategy called Prog-FedSSL that outperforms end-to-end training with a similar memory requirement and a $1.8 \times$ cheaper communication cost.
- [1241] arXiv:2401.11648 (cross-list from cs.LG) [ pdf , other ]
-
Title: Next Visit Diagnosis Prediction via Medical Code-Centric Multimodal Contrastive EHR Modelling with Hierarchical RegularisationAuthors: Heejoon KooComments: Accepted to EACL 2024 (The 18th Conference of the European Chapter of the Association for Computational Linguistics)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Predicting next visit diagnosis using Electronic Health Records (EHR) is an essential task in healthcare, critical for devising proactive future plans for both healthcare providers and patients. Nonetheless, many preceding studies have not sufficiently addressed the heterogeneous and hierarchical characteristics inherent in EHR data, inevitably leading to sub-optimal performance. To this end, we propose NECHO, a novel medical code-centric multimodal contrastive EHR learning framework with hierarchical regularisation. First, we integrate multifaceted information encompassing medical codes, demographics, and clinical notes using a tailored network design and a pair of bimodal contrastive losses, all of which pivot around a medical codes representation. We also regularise modality-specific encoders using a parental level information in medical ontology to learn hierarchical structure of EHR data. A series of experiments on MIMIC-III data demonstrates effectiveness of our approach.
- [1242] arXiv:2401.11660 (cross-list from cs.LG) [ pdf , other ]
-
Title: Differentiable Tree Search in Latent State SpaceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In decision-making problems with limited training data, policy functions approximated using deep neural networks often exhibit suboptimal performance. An alternative approach involves learning a world model from the limited data and determining actions through online search. However, the performance is adversely affected by compounding errors arising from inaccuracies in the learnt world model. While methods like TreeQN have attempted to address these inaccuracies by incorporating algorithmic structural biases into their architectures, the biases they introduce are often weak and insufficient for complex decision-making tasks. In this work, we introduce Differentiable Tree Search (DTS), a novel neural network architecture that significantly strengthens the inductive bias by embedding the algorithmic structure of a best-first online search algorithm. DTS employs a learnt world model to conduct a fully differentiable online search in latent state space. The world model is jointly optimised with the search algorithm, enabling the learning of a robust world model and mitigating the effect of model inaccuracies. We address potential Q-function discontinuities arising from naive incorporation of best-first search by adopting a stochastic tree expansion policy, formulating search tree expansion as a decision-making task, and introducing an effective variance reduction technique for the gradient computation. We evaluate DTS in an offline-RL setting with a limited training data scenario on Procgen games and grid navigation task, and demonstrate that DTS outperforms popular model-free and model-based baselines.
- [1243] arXiv:2401.11664 (cross-list from cs.LG) [ pdf , other ]
-
Title: Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAMAuthors: Bingbing Li , Geng Yuan , Zigeng Wang , Shaoyi Huang , Hongwu Peng , Payman Behnam , Wujie Wen , Hang Liu , Caiwen DingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its effectiveness.
- [1244] arXiv:2401.11666 (cross-list from cs.LG) [ pdf , other ]
-
Title: P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision TransformerComments: Accepted by the 49th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new task-specific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.
- [1245] arXiv:2401.11669 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An Improved Grey Wolf Optimization Algorithm for Heart Disease PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: This paper presents a unique solution to challenges in medical image processing by incorporating an adaptive curve grey wolf optimization (ACGWO) algorithm into neural network backpropagation. Neural networks show potential in medical data but suffer from issues like overfitting and lack of interpretability due to imbalanced and scarce data. Traditional Gray Wolf Optimization (GWO) also has its drawbacks, such as a lack of population diversity and premature convergence. This paper addresses these problems by introducing an adaptive algorithm, enhancing the standard GWO with a sigmoid function. This algorithm was extensively compared to four leading algorithms using six well-known test functions, outperforming them effectively. Moreover, by utilizing the ACGWO, we increase the robustness and generalization of the neural network, resulting in more interpretable predictions. Applied to the publicly accessible Cleveland Heart Disease dataset, our technique surpasses ten other methods, achieving 86.8% accuracy, indicating its potential for efficient heart disease prediction in the clinical setting.
- [1246] arXiv:2401.11674 (cross-list from cs.CV) [ pdf , other ]
-
Title: Memory-Efficient Prompt Tuning for Incremental Histopathology ClassificationComments: Accepted by AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent studies have made remarkable progress in histopathology classification. Based on current successes, contemporary works proposed to further upgrade the model towards a more generalizable and robust direction through incrementally learning from the sequentially delivered domains. Unlike previous parameter isolation based approaches that usually demand massive computation resources during model updating, we present a memory-efficient prompt tuning framework to cultivate model generalization potential in economical memory cost. For each incoming domain, we reuse the existing parameters of the initial classification model and attach lightweight trainable prompts into it for customized tuning. Considering the domain heterogeneity, we perform decoupled prompt tuning, where we adopt a domain-specific prompt for each domain to independently investigate its distinctive characteristics, and one domain-invariant prompt shared across all domains to continually explore the common content embedding throughout time. All domain-specific prompts will be appended to the prompt bank and isolated from further changes to prevent forgetting the distinctive features of early-seen domains. While the domain-invariant prompt will be passed on and iteratively evolve by style-augmented prompt refining to improve model generalization capability over time. In specific, we construct a graph with existing prompts and build a style-augmented graph attention network to guide the domain-invariant prompt exploring the overlapped latent embedding among all delivered domains for more domain generic representations. We have extensively evaluated our framework with two histopathology tasks, i.e., breast cancer metastasis classification and epithelium-stroma tissue classification, where our approach yielded superior performance and memory efficiency over the competing methods.
- [1247] arXiv:2401.11698 (cross-list from cs.LG) [ pdf , other ]
-
Title: Admission Prediction in Undergraduate Applications: an Interpretable Deep Learning ApproachComments: This paper has been accepted for Transdisciplinary AI 2023 conferenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This article addresses the challenge of validating the admission committee's decisions for undergraduate admissions. In recent years, the traditional review process has struggled to handle the overwhelmingly large amount of applicants' data. Moreover, this traditional assessment often leads to human bias, which might result in discrimination among applicants. Although classical machine learning-based approaches exist that aim to verify the quantitative assessment made by the application reviewers, these methods lack scalability and suffer from performance issues when a large volume of data is in place. In this context, we propose deep learning-based classifiers, namely Feed-Forward and Input Convex neural networks, which overcome the challenges faced by the existing methods. Furthermore, we give additional insights into our model by incorporating an interpretability module, namely LIME. Our training and test datasets comprise applicants' data with a wide range of variables and information. Our models achieve higher accuracy compared to the best-performing traditional machine learning-based approach by a considerable margin of 3.03\%. Additionally, we show the sensitivity of different features and their relative impacts on the overall admission decision using the LIME technique.
- [1248] arXiv:2401.11699 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Dissecting Bias of ChatGPT in College Major RecommendationsAuthors: Alex ZhengSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: I investigate bias in terms of ChatGPT's college major recommendations for students with various profiles, looking at demographic disparities in factors such as race, gender, and socioeconomic status, as well as educational disparities such as score percentiles. By constructing prompts for the ChatGPT API, allowing the model to recommend majors based on high school student profiles, I evaluate bias using various metrics, including the Jaccard Coefficient, Wasserstein Metric, and STEM Disparity Score. The results of this study reveal a significant disparity in the set of recommended college majors, irrespective of the bias metric applied.
- [1249] arXiv:2401.11705 (cross-list from cs.IR) [ pdf , other ]
-
Title: Domain-Aware Cross-Attention for Cross-domain RecommendationAuthors: Yuhao Luo , Shiwei Ma , Mingjun Nie , Changping Peng , Zhangang Lin , Jingping Shao , Qianfang XuComments: 6 pages, 1 figureSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Cross-domain recommendation (CDR) is an important method to improve recommender system performance, especially when observations in target domains are sparse. However, most existing cross-domain recommendations fail to fully utilize the target domain's special features and are hard to be generalized to new domains. The designed network is complex and is not suitable for rapid industrial deployment. Our method introduces a two-step domain-aware cross-attention, extracting transferable features of the source domain from different granularity, which allows the efficient expression of both domain and user interests. In addition, we simplify the training process, and our model can be easily deployed on new domains. We conduct experiments on both public datasets and industrial datasets, and the experimental results demonstrate the effectiveness of our method. We have also deployed the model in an online advertising system and observed significant improvements in both Click-Through-Rate (CTR) and effective cost per mille (ECPM).
- [1250] arXiv:2401.11713 (cross-list from cs.CV) [ pdf , other ]
-
Title: Medical Image Debiasing by Learning Adaptive Agreement from a Biased CouncilComments: 10 pages, 5 figures, 3 tables. Code and benchmark will be released via this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Deep learning could be prone to learning shortcuts raised by dataset bias and result in inaccurate, unreliable, and unfair models, which impedes its adoption in real-world clinical applications. Despite its significance, there is a dearth of research in the medical image classification domain to address dataset bias. Furthermore, the bias labels are often agnostic, as identifying biases can be laborious and depend on post-hoc interpretation. This paper proposes learning Adaptive Agreement from a Biased Council (Ada-ABC), a debiasing framework that does not rely on explicit bias labels to tackle dataset bias in medical images. Ada-ABC develops a biased council consisting of multiple classifiers optimized with generalized cross entropy loss to learn the dataset bias. A debiasing model is then simultaneously trained under the guidance of the biased council. Specifically, the debiasing model is required to learn adaptive agreement with the biased council by agreeing on the correctly predicted samples and disagreeing on the wrongly predicted samples by the biased council. In this way, the debiasing model could learn the target attribute on the samples without spurious correlations while also avoiding ignoring the rich information in samples with spurious correlations. We theoretically demonstrated that the debiasing model could learn the target features when the biased model successfully captures dataset bias. Moreover, to our best knowledge, we constructed the first medical debiasing benchmark from four datasets containing seven different bias scenarios. Our extensive experiments practically showed that our proposed Ada-ABC outperformed competitive approaches, verifying its effectiveness in mitigating dataset bias for medical image classification. The codes and organized benchmark datasets will be made publicly available.
- [1251] arXiv:2401.11719 (cross-list from cs.CV) [ pdf , other ]
-
Title: SFC: Shared Feature Calibration in Weakly Supervised Semantic SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Image-level weakly supervised semantic segmentation has received increasing attention due to its low annotation cost. Existing methods mainly rely on Class Activation Mapping (CAM) to obtain pseudo-labels for training semantic segmentation models. In this work, we are the first to demonstrate that long-tailed distribution in training data can cause the CAM calculated through classifier weights over-activated for head classes and under-activated for tail classes due to the shared features among head- and tail- classes. This degrades pseudo-label quality and further influences final semantic segmentation performance. To address this issue, we propose a Shared Feature Calibration (SFC) method for CAM generation. Specifically, we leverage the class prototypes that carry positive shared features and propose a Multi-Scaled Distribution-Weighted (MSDW) consistency loss for narrowing the gap between the CAMs generated through classifier weights and class prototypes during training. The MSDW loss counterbalances over-activation and under-activation by calibrating the shared features in head-/tail-class classifier weights. Experimental results show that our SFC significantly improves CAM boundaries and achieves new state-of-the-art performances. The project is available at this https URL .
- [1252] arXiv:2401.11720 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Condensation: A SurveySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The burgeoning volume of graph data poses significant challenges in storage, transmission, and particularly the training of graph neural networks (GNNs). To address these challenges, graph condensation (GC) has emerged as an innovative solution. GC focuses on synthesizing a compact yet highly representative graph, on which GNNs can achieve performance comparable to trained on the large original graph. The notable efficacy of GC and its broad prospects have garnered significant attention and spurred extensive research. This survey paper provides an up-to-date and systematic overview of GC, organizing existing research into four categories aligned with critical GC evaluation criteria: effectiveness, generalization, fairness, and efficiency. To facilitate an in-depth and comprehensive understanding of GC, we examine various methods under each category and thoroughly discuss two essential components within GC: optimization strategies and condensed graph generation. Additionally, we introduce the applications of GC in a variety of fields, and highlight the present challenges and novel insights in GC, promoting advancements in future research.
- [1253] arXiv:2401.11723 (cross-list from cs.CR) [ pdf , other ]
-
Title: Unraveling Attacks in Machine Learning-based IoT Ecosystems: A Survey and the Open Libraries Behind ThemSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The advent of the Internet of Things (IoT) has brought forth an era of unprecedented connectivity, with an estimated 80 billion smart devices expected to be in operation by the end of 2025. These devices facilitate a multitude of smart applications, enhancing the quality of life and efficiency across various domains. Machine Learning (ML) serves as a crucial technology, not only for analyzing IoT-generated data but also for diverse applications within the IoT ecosystem. For instance, ML finds utility in IoT device recognition, anomaly detection, and even in uncovering malicious activities. This paper embarks on a comprehensive exploration of the security threats arising from ML's integration into various facets of IoT, spanning various attack types including membership inference, adversarial evasion, reconstruction, property inference, model extraction, and poisoning attacks. Unlike previous studies, our work offers a holistic perspective, categorizing threats based on criteria such as adversary models, attack targets, and key security attributes (confidentiality, availability, and integrity). We delve into the underlying techniques of ML attacks in IoT environment, providing a critical evaluation of their mechanisms and impacts. Furthermore, our research thoroughly assesses 65 libraries, both author-contributed and third-party, evaluating their role in safeguarding model and data privacy. We emphasize the availability and usability of these libraries, aiming to arm the community with the necessary tools to bolster their defenses against the evolving threat landscape. Through our comprehensive review and analysis, this paper seeks to contribute to the ongoing discourse on ML-based IoT security, offering valuable insights and practical solutions to secure ML models and data in the rapidly expanding field of artificial intelligence in IoT.
- [1254] arXiv:2401.11724 (cross-list from cs.CV) [ pdf , other ]
-
Title: Augmenting Prototype Network with TransMix for Few-shot Hyperspectral Image ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Few-shot hyperspectral image classification aims to identify the classes of each pixel in the images by only marking few of these pixels. And in order to obtain the spatial-spectral joint features of each pixel, the fixed-size patches centering around each pixel are often used for classification. However, observing the classification results of existing methods, we found that boundary patches corresponding to the pixels which are located at the boundary of the objects in the hyperspectral images, are hard to classify. These boundary patchs are mixed with multi-class spectral information. Inspired by this, we propose to augment the prototype network with TransMix for few-shot hyperspectrial image classification(APNT). While taking the prototype network as the backbone, it adopts the transformer as feature extractor to learn the pixel-to-pixel relation and pay different attentions to different pixels. At the same time, instead of directly using the patches which are cut from the hyperspectral images for training, it randomly mixs up two patches to imitate the boundary patches and uses the synthetic patches to train the model, with the aim to enlarge the number of hard training samples and enhance their diversity. And by following the data agumentation technique TransMix, the attention returned by the transformer is also used to mix up the labels of two patches to generate better labels for synthetic patches. Compared with existing methods, the proposed method has demonstrated sate of the art performance and better robustness for few-shot hyperspectral image classification in our experiments.
- [1255] arXiv:2401.11731 (cross-list from cs.NI) [ pdf , other ]
-
Title: Fast and Scalable Network Slicing by Integrating Deep Learning with Lagrangian MethodsComments: 6 pages, 5 figures, IEEE Global Communications Conference 2023Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Network slicing is a key technique in 5G and beyond for efficiently supporting diverse services. Many network slicing solutions rely on deep learning to manage complex and high-dimensional resource allocation problems. However, deep learning models suffer limited generalization and adaptability to dynamic slicing configurations. In this paper, we propose a novel framework that integrates constrained optimization methods and deep learning models, resulting in strong generalization and superior approximation capability. Based on the proposed framework, we design a new neural-assisted algorithm to allocate radio resources to slices to maximize the network utility under inter-slice resource constraints. The algorithm exhibits high scalability, accommodating varying numbers of slices and slice configurations with ease. We implement the proposed solution in a system-level network simulator and evaluate its performance extensively by comparing it to state-of-the-art solutions including deep reinforcement learning approaches. The numerical results show that our solution obtains near-optimal quality-of-service satisfaction and promising generalization performance under different network slicing scenarios.
- [1256] arXiv:2401.11736 (cross-list from cs.LG) [ pdf , other ]
-
Title: Attention on Personalized Clinical Decision Support System: Federated Learning ApproachComments: Published in IEEE BigComp 2021Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a smart city. To the best of our knowledge, neural network models are already employed to assist healthcare professionals in achieving this goal. Typically, training a neural network requires a rich amount of data but heterogeneous and vulnerable properties of clinical data introduce a challenge for the traditional centralized network. Moreover, adding new inputs to a medical database requires re-training an existing model from scratch. To tackle these challenges, we proposed a deep learning-based clinical decision support system trained and managed under a federated learning paradigm. We focused on a novel strategy to guarantee the safety of patient privacy and overcome the risk of cyberattacks while enabling large-scale clinical data mining. As a result, we can leverage rich clinical data for training each local neural network without the need for exchanging the confidential data of patients. Moreover, we implemented the proposed scheme as a sequence-to-sequence model architecture integrating the attention mechanism. Thus, our objective is to provide a personalized clinical decision support system with evolvable characteristics that can deliver accurate solutions and assist healthcare professionals in medical diagnosing.
- [1257] arXiv:2401.11748 (cross-list from cs.CR) [ pdf , other ]
-
Title: GI-PIP: Do We Require Impractical Auxiliary Dataset for Gradient Inversion Attacks?Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Deep gradient inversion attacks expose a serious threat to Federated Learning (FL) by accurately recovering private data from shared gradients. However, the state-of-the-art heavily relies on impractical assumptions to access excessive auxiliary data, which violates the basic data partitioning principle of FL. In this paper, a novel method, Gradient Inversion Attack using Practical Image Prior (GI-PIP), is proposed under a revised threat model. GI-PIP exploits anomaly detection models to capture the underlying distribution from fewer data, while GAN-based methods consume significant more data to synthesize images. The extracted distribution is then leveraged to regulate the attack process as Anomaly Score loss. Experimental results show that GI-PIP achieves a 16.12 dB PSNR recovery using only 3.8% data of ImageNet, while GAN-based methods necessitate over 70%. Moreover, GI-PIP exhibits superior capability on distribution generalization compared to GAN-based methods. Our approach significantly alleviates the auxiliary data requirement on both amount and distribution in gradient inversion attacks, hence posing more substantial threat to real-world FL.
- [1258] arXiv:2401.11750 (cross-list from cs.LG) [ pdf , other ]
-
Title: AdaFGL: A New Paradigm for Federated Node Classification with Topology HeterogeneityComments: Accepted by ICDE 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Databases (cs.DB); Social and Information Networks (cs.SI)
Abstract: Recently, Federated Graph Learning (FGL) has attracted significant attention as a distributed framework based on graph neural networks, primarily due to its capability to break data silos. Existing FGL studies employ community split on the homophilous global graph by default to simulate federated semi-supervised node classification settings. Such a strategy assumes the consistency of topology between the multi-client subgraphs and the global graph, where connected nodes are highly likely to possess similar feature distributions and the same label. However, in real-world implementations, the varying perspectives of local data engineering result in various subgraph topologies, posing unique heterogeneity challenges in FGL. Unlike the well-known label Non-independent identical distribution (Non-iid) problems in federated learning, FGL heterogeneity essentially reveals the topological divergence among multiple clients, namely homophily or heterophily. To simulate and handle this unique challenge, we introduce the concept of structure Non-iid split and then present a new paradigm called \underline{Ada}ptive \underline{F}ederated \underline{G}raph \underline{L}earning (AdaFGL), a decoupled two-step personalized approach. To begin with, AdaFGL employs standard multi-client federated collaborative training to acquire the federated knowledge extractor by aggregating uploaded models in the final round at the server. Then, each client conducts personalized training based on the local subgraph and the federated knowledge extractor. Extensive experiments on the 12 graph benchmark datasets validate the superior performance of AdaFGL over state-of-the-art baselines. Specifically, in terms of test accuracy, our proposed AdaFGL outperforms baselines by significant margins of 3.24\% and 5.57\% on community split and structure Non-iid split, respectively.
- [1259] arXiv:2401.11755 (cross-list from cs.LG) [ pdf , other ]
-
Title: FedGTA: Topology-aware Averaging for Federated Graph LearningComments: Accepted by VLDB 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Databases (cs.DB); Social and Information Networks (cs.SI)
Abstract: Federated Graph Learning (FGL) is a distributed machine learning paradigm that enables collaborative training on large-scale subgraphs across multiple local systems. Existing FGL studies fall into two categories: (i) FGL Optimization, which improves multi-client training in existing machine learning models; (ii) FGL Model, which enhances performance with complex local models and multi-client interactions. However, most FGL optimization strategies are designed specifically for the computer vision domain and ignore graph structure, presenting dissatisfied performance and slow convergence. Meanwhile, complex local model architectures in FGL Models studies lack scalability for handling large-scale subgraphs and have deployment limitations. To address these issues, we propose Federated Graph Topology-aware Aggregation (FedGTA), a personalized optimization strategy that optimizes through topology-aware local smoothing confidence and mixed neighbor features. During experiments, we deploy FedGTA in 12 multi-scale real-world datasets with the Louvain and Metis split. This allows us to evaluate the performance and robustness of FedGTA across a range of scenarios. Extensive experiments demonstrate that FedGTA achieves state-of-the-art performance while exhibiting high scalability and efficiency. The experiment includes ogbn-papers100M, the most representative large-scale graph database so that we can verify the applicability of our method to large-scale graph learning. To the best of our knowledge, our study is the first to bridge large-scale graph learning with FGL using this optimization strategy, contributing to the development of efficient and scalable FGL methods.
- [1260] arXiv:2401.11772 (cross-list from cs.LG) [ pdf , other ]
-
Title: LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation LearningAuthors: Xunkai Li , Meihao Liao , Zhengyu Wu , Daohan Su , Wentao Zhang , Rong-Hua Li , Guoren WangComments: Accepted by VLDB 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).
- [1261] arXiv:2401.11792 (cross-list from cs.RO) [ pdf , other ]
-
Title: Safe and Generalized end-to-end Autonomous Driving System with Reinforcement Learning and DemonstrationsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: An intelligent driving system should be capable of dynamically formulating appropriate driving strategies based on the current environment and vehicle status, while ensuring the security and reliability of the system. However, existing methods based on reinforcement learning and imitation learning suffer from low safety, poor generalization, and inefficient sampling. Additionally, they cannot accurately predict future driving trajectories, and the accurate prediction of future driving trajectories is a precondition for making optimal decisions. To solve these problems, in this paper, we introduce a Safe and Generalized end-to-end Autonomous Driving System (SGADS) for complex and various scenarios. Our SGADS incorporates variational inference with normalizing flows, enabling the intelligent vehicle to accurately predict future driving trajectories. Moreover, we propose the formulation of robust safety constraints. Furthermore, we combine reinforcement learning with demonstrations to augment search process of the agent. The experimental results demonstrate that our SGADS can significantly improve safety performance, exhibit strong generalization, and enhance the training efficiency of intelligent vehicles in complex urban scenarios compared to existing methods.
- [1262] arXiv:2401.11798 (cross-list from cs.LG) [ pdf , other ]
-
Title: Knowledge Distillation on Spatial-Temporal Graph Convolutional Network for Traffic PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Efficient real-time traffic prediction is crucial for reducing transportation time. To predict traffic conditions, we employ a spatio-temporal graph neural network (ST-GNN) to model our real-time traffic data as temporal graphs. Despite its capabilities, it often encounters challenges in delivering efficient real-time predictions for real-world traffic data. Recognizing the significance of timely prediction due to the dynamic nature of real-time data, we employ knowledge distillation (KD) as a solution to enhance the execution time of ST-GNNs for traffic prediction. In this paper, We introduce a cost function designed to train a network with fewer parameters (the student) using distilled data from a complex network (the teacher) while maintaining its accuracy close to that of the teacher. We use knowledge distillation, incorporating spatial-temporal correlations from the teacher network to enable the student to learn the complex patterns perceived by the teacher. However, a challenge arises in determining the student network architecture rather than considering it inadvertently. To address this challenge, we propose an algorithm that utilizes the cost function to calculate pruning scores, addressing small network architecture search issues, and jointly fine-tunes the network resulting from each pruning stage using KD. Ultimately, we evaluate our proposed ideas on two real-world datasets, PeMSD7 and PeMSD8. The results indicate that our method can maintain the student's accuracy close to that of the teacher, even with the retention of only $3\%$ of network parameters.
- [1263] arXiv:2401.11810 (cross-list from cs.LG) [ pdf , other ]
-
Title: Generalization and Informativeness of Conformal PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: The safe integration of machine learning modules in decision-making processes hinges on their ability to quantify uncertainty. A popular technique to achieve this goal is conformal prediction (CP), which transforms an arbitrary base predictor into a set predictor with coverage guarantees. While CP certifies the predicted set to contain the target quantity with a user-defined tolerance, it does not provide control over the average size of the predicted sets, i.e., over the informativeness of the prediction. In this work, a theoretical connection is established between the generalization properties of the base predictor and the informativeness of the resulting CP prediction sets. To this end, an upper bound is derived on the expected size of the CP set predictor that builds on generalization error bounds for the base predictor. The derived upper bound provides insights into the dependence of the average size of the CP set predictor on the amount of calibration data, the target reliability, and the generalization performance of the base predictor. The theoretical insights are validated using simple numerical regression and classification tasks.
- [1264] arXiv:2401.11814 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Symbrain: A large-scale dataset of MRI images for neonatal brain symmetry analysisAuthors: Arnaud Gucciardi , Safouane El Ghazouali , Francesca Venturini , Vida Groznik , Umberto MichelucciComments: 7 pages, 2 figures, Dataset Paper, Medical AISubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents an annotated dataset of brain MRI images designed to advance the field of brain symmetry study. Magnetic resonance imaging (MRI) has gained interest in analyzing brain symmetry in neonatal infants, and challenges remain due to the vast size differences between fetal and adult brains. Classification methods for brain structural MRI use scales and visual cues to assess hemisphere symmetry, which can help diagnose neonatal patients by comparing hemispheres and anatomical regions of interest in the brain. Using the Developing Human Connectome Project dataset, this work presents a dataset comprising cerebral images extracted as slices across selected portions of interest for clinical evaluation . All the extracted images are annotated with the brain's midline. All the extracted images are annotated with the brain's midline. From the assumption that a decrease in symmetry is directly related to possible clinical pathologies, the dataset can contribute to a more precise diagnosis because it can be used to train deep learning model application in neonatal cerebral MRI anomaly detection from postnatal infant scans thanks to computer vision. Such models learn to identify and classify anomalies by identifying potential asymmetrical patterns in medical MRI images. Furthermore, this dataset can contribute to the research and development of methods using the relative symmetry of the two brain hemispheres for crucial diagnosis and treatment planning.
- [1265] arXiv:2401.11817 (cross-list from cs.CL) [ pdf , other ]
-
Title: Hallucination is Inevitable: An Innate Limitation of Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Hallucination has been widely recognized to be a significant drawback for large language models (LLMs). There have been many works that attempt to reduce the extent of hallucination. These efforts have mostly been empirical so far, which cannot answer the fundamental question whether it can be completely eliminated. In this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies between a computable LLM and a computable ground truth function. By employing results from learning theory, we show that LLMs cannot learn all of the computable functions and will therefore always hallucinate. Since the formal world is a part of the real world which is much more complicated, hallucinations are also inevitable for real world LLMs. Furthermore, for real world LLMs constrained by provable time complexity, we describe the hallucination-prone tasks and empirically validate our claims. Finally, using the formal world framework, we discuss the possible mechanisms and efficacies of existing hallucination mitigators as well as the practical implications on the safe deployment of LLMs.
- [1266] arXiv:2401.11819 (cross-list from cs.CL) [ pdf , other ]
-
Title: SuperCLUE-Math6: Graded Multi-Step Math Reasoning Benchmark for LLMs in ChineseComments: Dataset revised and finalized, results updated with new model; 8 pages, 7 figures, 4 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We introduce SuperCLUE-Math6(SC-Math6), a new benchmark dataset to evaluate the mathematical reasoning abilities of Chinese language models. SC-Math6 is designed as an upgraded Chinese version of the GSM8K dataset with enhanced difficulty, diversity, and application scope. It consists of over 2000 mathematical word problems requiring multi-step reasoning and providing natural language solutions. We propose an innovative scheme to quantify the reasoning capability of large models based on performance over problems with different reasoning steps. Experiments on 13 representative Chinese models demonstrate a clear stratification of reasoning levels, with top models like GPT-4 showing superior performance. SC-Math6 fills the gap in Chinese mathematical reasoning benchmarks and provides a comprehensive testbed to advance the intelligence of Chinese language models.
- [1267] arXiv:2401.11840 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learning to Approximate Adaptive Kernel Convolution on GraphsComments: 15 pages, Accepted to AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Various Graph Neural Networks (GNNs) have been successful in analyzing data in non-Euclidean spaces, however, they have limitations such as oversmoothing, i.e., information becomes excessively averaged as the number of hidden layers increases. The issue stems from the intrinsic formulation of conventional graph convolution where the nodal features are aggregated from a direct neighborhood per layer across the entire nodes in the graph. As setting different number of hidden layers per node is infeasible, recent works leverage a diffusion kernel to redefine the graph structure and incorporate information from farther nodes. Unfortunately, such approaches suffer from heavy diagonalization of a graph Laplacian or learning a large transform matrix. In this regards, we propose a diffusion learning framework, where the range of feature aggregation is controlled by the scale of a diffusion kernel. For efficient computation, we derive closed-form derivatives of approximations of the graph convolution with respect to the scale, so that node-wise range can be adaptively learned. With a downstream classifier, the entire framework is made trainable in an end-to-end manner. Our model is tested on various standard datasets for node-wise classification for the state-of-the-art performance, and it is also validated on a real-world brain network data for graph classifications to demonstrate its practicality for Alzheimer classification.
- [1268] arXiv:2401.11844 (cross-list from cs.CV) [ pdf , other ]
-
Title: Adaptive Fusion of Multi-view Remote Sensing data for Optimal Sub-field Crop Yield PredictionAuthors: Francisco Mena , Deepak Pathak , Hiba Najjar , Cristhian Sanchez , Patrick Helber , Benjamin Bischke , Peter Habelitz , Miro Miranda , Jayanth Siddamsetty , Marlon Nuske , Marcela Charfuelan , Diego Arenas , Michaela Vollmer , Andreas DengelSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Accurate crop yield prediction is of utmost importance for informed decision-making in agriculture, aiding farmers, and industry stakeholders. However, this task is complex and depends on multiple factors, such as environmental conditions, soil properties, and management practices. Combining heterogeneous data views poses a fusion challenge, like identifying the view-specific contribution to the predictive task. We present a novel multi-view learning approach to predict crop yield for different crops (soybean, wheat, rapeseed) and regions (Argentina, Uruguay, and Germany). Our multi-view input data includes multi-spectral optical images from Sentinel-2 satellites and weather data as dynamic features during the crop growing season, complemented by static features like soil properties and topographic information. To effectively fuse the data, we introduce a Multi-view Gated Fusion (MVGF) model, comprising dedicated view-encoders and a Gated Unit (GU) module. The view-encoders handle the heterogeneity of data sources with varying temporal resolutions by learning a view-specific representation. These representations are adaptively fused via a weighted sum. The fusion weights are computed for each sample by the GU using a concatenation of the view-representations. The MVGF model is trained at sub-field level with 10 m resolution pixels. Our evaluations show that the MVGF outperforms conventional models on the same task, achieving the best results by incorporating all the data sources, unlike the usual fusion results in the literature. For Argentina, the MVGF model achieves an R2 value of 0.68 at sub-field yield prediction, while at field level evaluation (comparing field averages), it reaches around 0.80 across different countries. The GU module learned different weights based on the country and crop-type, aligning with the variable significance of each data source to the prediction task.
- [1269] arXiv:2401.11849 (cross-list from cs.LG) [ pdf , other ]
-
Title: Self-Labeling the Job Shop Scheduling ProblemSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Combinatorics (math.CO)
Abstract: In this work, we propose a Self-Supervised training strategy specifically designed for combinatorial problems. One of the main obstacles in applying supervised paradigms to such problems is the requirement of expensive target solutions as ground-truth, often produced with costly exact solvers. Inspired by Semi- and Self-Supervised learning, we show that it is possible to easily train generative models by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, completely removing the need for optimality information. We prove the effectiveness of this Self-Labeling strategy on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the Reinforcement Learning community. We propose a generative model based on the well-known Pointer Network and train it with our strategy. Experiments on two popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and current state-of-the-art Reinforcement Learning proposals.
- [1270] arXiv:2401.11851 (cross-list from cs.AR) [ pdf , other ]
-
Title: BETA: Binarized Energy-Efficient Transformer Accelerator at the EdgeComments: This paper is accepted by 2024 IEEE International Symposium on Circuits and Systems (ISCAS 2024)Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: Existing binary Transformers are promising in edge deployment due to their compact model size, low computational complexity, and considerable inference accuracy. However, deploying binary Transformers faces challenges on prior processors due to inefficient execution of quantized matrix multiplication (QMM) and the energy consumption overhead caused by multi-precision activations. To tackle the challenges above, we first develop a computation flow abstraction method for binary Transformers to improve QMM execution efficiency by optimizing the computation order. Furthermore, a binarized energy-efficient Transformer accelerator, namely BETA, is proposed to boost the efficient deployment at the edge. Notably, BETA features a configurable QMM engine, accommodating diverse activation precisions of binary Transformers and offering high-parallelism and high-speed for QMMs with impressive energy efficiency. Experimental results evaluated on ZCU102 FPGA show BETA achieves an average energy efficiency of 174 GOPS/W, which is 1.76~21.92x higher than prior FPGA-based accelerators, showing BETA's good potential for edge Transformer acceleration.
- [1271] arXiv:2401.11852 (cross-list from cs.CL) [ pdf , other ]
-
Title: The Right Model for the Job: An Evaluation of Legal Multi-Label Classification BaselinesAuthors: Martina Forster , Claudia Schulz , Prudhvi Nokku , Melicaalsadat Mirsafian , Jaykumar Kasundra , Stavroula SkylakiSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Multi-Label Classification (MLC) is a common task in the legal domain, where more than one label may be assigned to a legal document. A wide range of methods can be applied, ranging from traditional ML approaches to the latest Transformer-based architectures. In this work, we perform an evaluation of different MLC methods using two public legal datasets, POSTURE50K and EURLEX57K. By varying the amount of training data and the number of labels, we explore the comparative advantage offered by different approaches in relation to the dataset properties. Our findings highlight DistilRoBERTa and LegalBERT as performing consistently well in legal MLC with reasonable computational demands. T5 also demonstrates comparable performance while offering advantages as a generative model in the presence of changing label sets. Finally, we show that the CrossEncoder exhibits potential for notable macro-F1 score improvements, albeit with increased computational costs.
- [1272] arXiv:2401.11860 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Review of Physics-Informed Machine Learning Methods with Applications to Condition Monitoring and Anomaly DetectionComments: Paper has been submitted for review to the journal Expert Systems with Applications (December 31, 2023). 90 pages, 22 figures, 9 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: This study presents a comprehensive overview of PIML techniques in the context of condition monitoring. The central concept driving PIML is the incorporation of known physical laws and constraints into machine learning algorithms, enabling them to learn from available data while remaining consistent with physical principles. Through fusing domain knowledge with data-driven learning, PIML methods offer enhanced accuracy and interpretability in comparison to purely data-driven approaches. In this comprehensive survey, detailed examinations are performed with regard to the methodology by which known physical principles are integrated within machine learning frameworks, as well as their suitability for specific tasks within condition monitoring. Incorporation of physical knowledge into the ML model may be realized in a variety of methods, with each having its unique advantages and drawbacks. The distinct advantages and limitations of each methodology for the integration of physics within data-driven models are detailed, considering factors such as computational efficiency, model interpretability, and generalizability to different systems in condition monitoring and fault detection. Several case studies and works of literature utilizing this emerging concept are presented to demonstrate the efficacy of PIML in condition monitoring applications. From the literature reviewed, the versatility and potential of PIML in condition monitoring may be demonstrated. Novel PIML methods offer an innovative solution for addressing the complexities of condition monitoring and associated challenges. This comprehensive survey helps form the foundation for future work in the field. As the technology continues to advance, PIML is expected to play a crucial role in enhancing maintenance strategies, system reliability, and overall operational efficiency in engineering systems.
- [1273] arXiv:2401.11864 (cross-list from cs.CL) [ pdf , other ]
-
Title: Distilling Mathematical Reasoning Capabilities into Small Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This work addresses the challenge of democratizing advanced Large Language Models (LLMs) by compressing their mathematical reasoning capabilities into sub-billion parameter Small Language Models (SLMs) without compromising performance. We introduce Equation-of-Thought Distillation (EoTD), a novel technique that encapsulates the reasoning process into equation-based representations to construct an EoTD dataset for fine-tuning SLMs. Additionally, we propose the Ensemble Thoughts Distillation (ETD) framework to enhance the reasoning performance of SLMs. This involves creating a reasoning dataset with multiple thought processes, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Equation-of-Thought (EoT), and using it for fine-tuning. Our experimental findings demonstrate that EoTD significantly boosts the reasoning abilities of SLMs, while ETD enables these models to achieve state-of-the-art reasoning performance.
- [1274] arXiv:2401.11880 (cross-list from cs.CL) [ pdf , other ]
-
Title: PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System SafetyAuthors: Zaibin Zhang , Yongting Zhang , Lijun Li , Hongzhi Gao , Lijun Wang , Huchuan Lu , Feng Zhao , Yu Qiao , Jing ShaoSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Multiagent Systems (cs.MA)
Abstract: Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety. To tackle these concerns, we propose a comprehensive framework (PsySafe) grounded in agent psychology, focusing on three key areas: firstly, identifying how dark personality traits in agents can lead to risky behaviors; secondly, evaluating the safety of multi-agent systems from the psychological and behavioral perspectives, and thirdly, devising effective strategies to mitigate these risks. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We will make our data and code publicly accessible at this https URL .
- [1275] arXiv:2401.11900 (cross-list from cs.SC) [ pdf , other ]
-
Title: Showing Proofs, Assessing Difficulty with GeoGebra DiscoveryAuthors: Zoltán Kovács (The Private University College of Education of the Diocese of Linz, Austria), Tomás Recio (Escuela Politécnica Superior, Universidad Antonio de Nebrija, Madrid, Spain), M. Pilar Vélez (Escuela Politécnica Superior, Universidad Antonio de Nebrija, Madrid, Spain)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 43-52Subjects: Symbolic Computation (cs.SC) ; Artificial Intelligence (cs.AI); Computational Geometry (cs.CG)
Abstract: In our contribution we describe some on-going improvements concerning the Automated Reasoning Tools developed in GeoGebra Discovery, providing different examples of the performance of these new features. We describe the new ShowProof command, that outputs both the sequence of the different steps performed by GeoGebra Discovery to confirm a certain statement, as well as a number intending to grade the difficulty or interest of the assertion. The proposal of this assessment measure, involving the comparison of the expression of the thesis (or conclusion) as a combination of the hypotheses, will be developed.
- [1276] arXiv:2401.11906 (cross-list from cs.SC) [ pdf , other ]
-
Title: Solving with GeoGebra Discovery an Austrian Mathematics Olympiad problem: Lessons LearnedAuthors: Belén Ariño-Morera (Departamento de Economía Financiera y Contabilidad, Universidad Rey Juan Carlos, Madrid, Spain), Zoltán Kovács (The Private University College of Education of the Diocese of Linz, Austria), Tomás Recio (Escuela Politécnica Superior, Universidad Antonio de Nebrija, Madrid, Spain), Piedad Tolmos (Departamento de Economía Financiera y Contabilidad, Universidad Rey Juan Carlos, Madrid, Spain)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 101-109Subjects: Symbolic Computation (cs.SC) ; Artificial Intelligence (cs.AI); Computational Geometry (cs.CG)
Abstract: We address, through the automated reasoning tools in GeoGebra Discovery, a problem from a regional phase of the Austrian Mathematics Olympiad 2023. Trying to solve this problem gives rise to four different kind of feedback: the almost instantaneous, automated solution of the proposed problem; the measure of its complexity, according to some recent proposals; the automated discovery of a generalization of the given assertion, showing that the same statement is true over more general polygons than those mentioned in the problem; and the difficulties associated to the analysis of the surprising and involved high number of degenerate cases that appear when using the LocusEquation command in this problem. In our communication we will describe and reflect on these diverse issues, enhancing its exemplar role for showing some of the advantages, problems, and current fields of development of GeoGebra Discovery.
- [1277] arXiv:2401.11911 (cross-list from cs.CL) [ pdf , other ]
-
Title: Blinded by Generated Contexts: How Language Models Merge Generated and Retrieved Contexts for Open-Domain QA?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: While auxiliary information has become a key to enhancing Large Language Models (LLMs), relatively little is known about how LLMs merge these contexts, specifically contexts generated by LLMs and those retrieved from external sources. To investigate this, we formulate a systematic framework to identify whether LLMs' responses, derived from the integration of generated and retrieved contexts, are attributed to either generated or retrieved contexts. To easily trace the origin of the response, we construct datasets with conflicting contexts, i.e., each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer. Our experiments reveal a significant bias in several LLMs (GPT-4/3.5 and Llama2) to favor generated contexts, even when they provide incorrect information. We further identify two key factors contributing to this bias: i) contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of being selected; ii) the segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs. Our analysis enhances the understanding of how LLMs merge diverse contexts, offering valuable insights for advancing current augmentation methods for LLMs.
- [1278] arXiv:2401.11913 (cross-list from cs.CV) [ pdf , other ]
-
Title: Large receptive field strategy and important feature extraction strategy in 3D object detectionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kernel, we introduce the Dynamic Feature Fusion Module (DFFM). This module achieves adaptive expansion of the 3D convolutional kernel's receptive field, balancing the expansion with acceptable computational loads. This innovation reduces operations, expands the receptive field, and allows the model to dynamically adjust to different object requirements. Simultaneously, we identify redundant information in 3D features. Employing the Feature Selection Module (FSM) quantitatively evaluates and eliminates non-important features, achieving the separation of output box fitting and feature extraction. This innovation enables the detector to focus on critical features, resulting in model compression, reduced computational burden, and minimized candidate frame interference. Extensive experiments confirm that both DFFM and FSM not only enhance current benchmarks, particularly in small target detection, but also accelerate network performance. Importantly, these modules exhibit effective complementarity.
- [1279] arXiv:2401.11944 (cross-list from cs.CL) [ pdf , other ]
-
Title: CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding BenchmarkAuthors: Ge Zhang , Xinrun Du , Bei Chen , Yiming Liang , Tongxu Luo , Tianyu Zheng , Kang Zhu , Yuyang Cheng , Chunpu Xu , Shuyue Guo , Haoran Zhang , Xingwei Qu , Junjie Wang , Ruibin Yuan , Yizhi Li , Zekun Wang , Yudong Liu , Yu-Hsuan Tsai , Fengji Zhang , Chenghua Lin , Wenhao Huang , Wenhu Chen , Jie FuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: As the capabilities of large multimodal models (LMMs) continue to advance, evaluating the performance of LMMs emerges as an increasing need. Additionally, there is an even larger gap in evaluating the advanced knowledge and reasoning abilities of LMMs in non-English contexts such as Chinese. We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to evaluate LMMs on tasks demanding college-level subject knowledge and deliberate reasoning in a Chinese context. CMMMU is inspired by and strictly follows the annotation and analysis pattern of MMMU.
CMMMU includes 12k manually collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering, like its companion, MMMU. These questions span 30 subjects and comprise 39 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures.
CMMMU focuses on complex perception and reasoning with domain-specific knowledge in the Chinese context. We evaluate 11 open-source LLMs and one proprietary GPT-4V(ision). Even GPT-4V only achieves accuracies of 42%, indicating a large space for improvement. CMMMU will boost the community to build the next-generation LMMs towards expert artificial intelligence and promote the democratization of LMMs by providing diverse language contexts. - [1280] arXiv:2401.11963 (cross-list from cs.NE) [ pdf , other ]
-
Title: Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive SurveySubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing the strengths of both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in relevant algorithms and identify three primary research directions: EA-assisted optimization of RL, RL-assisted optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EA and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions. To facilitate researchers in delving into ERL, we organize the algorithms and codes involved on this https URL .
- [1281] arXiv:2401.12007 (cross-list from cs.LG) [ pdf , other ]
-
Title: Tensor-view Topological Graph Neural NetworkComments: Accepted at AISTATS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph classification is an important learning task for graph-structured data. Graph neural networks (GNNs) have recently gained growing attention in graph learning and have shown significant improvements in many important graph problems. Despite their state-of-the-art performances, existing GNNs only use local information from a very limited neighborhood around each node, suffering from loss of multi-modal information and overheads of excessive computation. To address these issues, we propose a novel Tensor-view Topological Graph Neural Network (TTG-NN), a class of simple yet effective topological deep learning built upon persistent homology, graph convolution, and tensor operations. This new method incorporates tensor learning to simultaneously capture Tensor-view Topological (TT), as well as Tensor-view Graph (TG) structural information on both local and global levels. Computationally, to fully exploit graph topology and structure, we propose two flexible TT and TG representation learning modules that disentangle feature tensor aggregation and transformation and learn to preserve multi-modal structure with less computation. Theoretically, we derive high probability bounds on both the out-of-sample and in-sample mean squared approximation errors for our proposed Tensor Transformation Layer (TTL). Real data experiments show that the proposed TTG-NN outperforms 20 state-of-the-art methods on various graph benchmarks.
- [1282] arXiv:2401.12014 (cross-list from cs.LG) [ pdf , other ]
-
Title: Robustness to distribution shifts of compressed networks for edge devicesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: It is necessary to develop efficient DNNs deployed on edge devices with limited computation resources. However, the compressed networks often execute new tasks in the target domain, which is different from the source domain where the original network is trained. It is important to investigate the robustness of compressed networks in two types of data distribution shifts: domain shifts and adversarial perturbations. In this study, we discover that compressed models are less robust to distribution shifts than their original networks. Interestingly, larger networks are more vulnerable to losing robustness than smaller ones, even when they are compressed to a similar size as the smaller networks. Furthermore, compact networks obtained by knowledge distillation are much more robust to distribution shifts than pruned networks. Finally, post-training quantization is a reliable method for achieving significant robustness to distribution shifts, and it outperforms both pruned and distilled models in terms of robustness.
- [1283] arXiv:2401.12024 (cross-list from cs.RO) [ pdf , other ]
-
Title: Multimodal Visual-Tactile Representation Learning through Self-Supervised Contrastive Pre-TrainingSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The rapidly evolving field of robotics necessitates methods that can facilitate the fusion of multiple modalities. Specifically, when it comes to interacting with tangible objects, effectively combining visual and tactile sensory data is key to understanding and navigating the complex dynamics of the physical world, enabling a more nuanced and adaptable response to changing environments. Nevertheless, much of the earlier work in merging these two sensory modalities has relied on supervised methods utilizing datasets labeled by humans.This paper introduces MViTac, a novel methodology that leverages contrastive learning to integrate vision and touch sensations in a self-supervised fashion. By availing both sensory inputs, MViTac leverages intra and inter-modality losses for learning representations, resulting in enhanced material property classification and more adept grasping prediction. Through a series of experiments, we showcase the effectiveness of our method and its superiority over existing state-of-the-art self-supervised and supervised techniques. In evaluating our methodology, we focus on two distinct tasks: material classification and grasping success prediction. Our results indicate that MViTac facilitates the development of improved modality encoders, yielding more robust representations as evidenced by linear probing assessments.
- [1284] arXiv:2401.12032 (cross-list from cs.HC) [ pdf , other ]
-
Title: MINT: A wrapper to make multi-modal and multi-image AI models interactiveAuthors: Jan Freyberg , Abhijit Guha Roy , Terry Spitz , Beverly Freeman , Mike Schaekermann , Patricia Strachan , Eva Schnider , Renee Wong , Dale R Webster , Alan Karthikesalingam , Yun Liu , Krishnamurthy Dvijotham , Umesh TelangComments: 15 pages, 7 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: During the diagnostic process, doctors incorporate multimodal information including imaging and the medical history - and similarly medical AI development has increasingly become multimodal. In this paper we tackle a more subtle challenge: doctors take a targeted medical history to obtain only the most pertinent pieces of information; how do we enable AI to do the same? We develop a wrapper method named MINT (Make your model INTeractive) that automatically determines what pieces of information are most valuable at each step, and ask for only the most useful information. We demonstrate the efficacy of MINT wrapping a skin disease prediction model, where multiple images and a set of optional answers to $25$ standard metadata questions (i.e., structured medical history) are used by a multi-modal deep network to provide a differential diagnosis. We show that MINT can identify whether metadata inputs are needed and if so, which question to ask next. We also demonstrate that when collecting multiple images, MINT can identify if an additional image would be beneficial, and if so, which type of image to capture. We showed that MINT reduces the number of metadata and image inputs needed by 82% and 36.2% respectively, while maintaining predictive performance. Using real-world AI dermatology system data, we show that needing fewer inputs can retain users that may otherwise fail to complete the system submission and drop off without a diagnosis. Qualitative examples show MINT can closely mimic the step-by-step decision making process of a clinical workflow and how this is different for straight forward cases versus more difficult, ambiguous cases. Finally we demonstrate how MINT is robust to different underlying multi-model classifiers and can be easily adapted to user requirements without significant model re-training.
- [1285] arXiv:2401.12051 (cross-list from cs.CV) [ pdf , other ]
-
Title: CloSe: A 3D Clothing Segmentation Dataset and ModelSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: 3D Clothing modeling and datasets play crucial role in the entertainment, animation, and digital fashion industries. Existing work often lacks detailed semantic understanding or uses synthetic datasets, lacking realism and personalization. To address this, we first introduce CloSe-D: a novel large-scale dataset containing 3D clothing segmentation of 3167 scans, covering a range of 18 distinct clothing classes. Additionally, we propose CloSe-Net, the first learning-based 3D clothing segmentation model for fine-grained segmentation from colored point clouds. CloSe-Net uses local point features, body-clothing correlation, and a garment-class and point features-based attention module, improving performance over baselines and prior work. The proposed attention module enables our model to learn appearance and geometry-dependent clothing prior from data. We further validate the efficacy of our approach by successfully segmenting publicly available datasets of people in clothing. We also introduce CloSe-T, a 3D interactive tool for refining segmentation labels. Combining the tool with CloSe-T in a continual learning setup demonstrates improved generalization on real-world data. Dataset, model, and tool can be found at this https URL .
- [1286] arXiv:2401.12070 (cross-list from cs.CL) [ pdf , other ]
-
Title: Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated TextAuthors: Abhimanyu Hans , Avi Schwarzschild , Valeriia Cherepanova , Hamid Kazemi , Aniruddha Saha , Micah Goldblum , Jonas Geiping , Tom GoldsteinComments: 20 pages, code available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.
- [1287] arXiv:2401.12086 (cross-list from cs.CL) [ pdf , other ]
-
Title: West-of-N: Synthetic Preference Generation for Improved Reward ModelingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The success of reinforcement learning from human feedback (RLHF) in language model alignment is strongly dependent on the quality of the underlying reward model. In this paper, we present a novel approach to improve reward model quality by generating synthetic preference data, thereby augmenting the training dataset with on-policy, high-quality preference pairs. Motivated by the promising results of Best-of-N sampling strategies in language model training, we extend their application to reward model training. This results in a self-training strategy to generate preference pairs by selecting the best and worst candidates in a pool of responses to a given query. Empirically, we find that this approach improves the performance of any reward model, with an effect comparable to the addition of a similar quantity of human preference data. This work opens up new avenues of research for improving RLHF for language model alignment, by offering synthetic preference generation as a solution to reward modeling challenges.
- [1288] arXiv:2401.12113 (cross-list from cs.LG) [ pdf , other ]
-
Title: Extracting Formulae in Many-Valued Logic from Deep Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: We propose a new perspective on deep ReLU networks, namely as circuit counterparts of Lukasiewicz infinite-valued logic -- a many-valued (MV) generalization of Boolean logic. An algorithm for extracting formulae in MV logic from deep ReLU networks is presented. As the algorithm applies to networks with general, in particular also real-valued, weights, it can be used to extract logical formulae from deep ReLU networks trained on data.
- [1289] arXiv:2401.12132 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Evaluation of QCNN-LSTM for Disability Forecasting in Multiple Sclerosis Using Sequential Multisequence MRISubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Image and Video Processing (eess.IV)
Abstract: Introduction Quantum Convolutional Neural Network (QCNN)-Long Short-Term Memory (LSTM) models were studied to provide sequential relationships for each timepoint in MRIs of patients with Multiple Sclerosis (MS). In this pilot study, we compared three QCNN-LSTM models for binary classification of MS disability benchmarked against classical neural network architectures. Our hypothesis is that quantum models will provide competitive performance. Methods Matrix Product State (MPS), reverse Multistate Entanglement Renormalization Ansatz (MERA), and Tree-Tensor Network (TTN) circuits were paired with LSTM layer to process near-annual MRI data of patients diagnosed with MS. These were benchmarked against a Visual Geometry Group (VGG)-LSTM and a Video Vision Transformer (ViViT). Predicted logits were measured against ground truth labels of each patient's Extended Disability Severity Score (EDSS) using binary cross-entropy loss. Training/validation/holdout testing was partitioned using 5-fold cross validation with a total split of 60:20:20. Levene's test of variance was used to measure statistical difference and Student's t-test for paired model differences in mean. Results The MPS-LSTM, reverse MERA-LSTM, and TTN-LSTM had holdout testing ROC-AUC of 0.70, 0.77, and 0.81, respectively (p-value 0.915). VGG16-LSTM and ViViT performed similarly with ROC-AUC of 0.73 and 0.77, respectively (p-value 0.631). Overall variance and mean were not statistically significant (p-value 0.713), however, time to train was significantly faster for the QCNN-LSTMs (39.4 sec per fold vs. 224 and 218, respectively, p-value <0.001). Conclusion QCNN-LSTM models perform competitively to their classical counterparts with greater efficiency in train time. Clinically, these can add value in terms of efficiency to time-dependent deep learning prediction of disease progression based upon medical imaging.
- [1290] arXiv:2401.12164 (cross-list from cs.CV) [ pdf , other ]
-
Title: Semi-supervised segmentation of land cover images using nonlinear canonical correlation analysis with multiple features and t-SNESubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Image segmentation is a clustering task whereby each pixel is assigned a cluster label. Remote sensing data usually consists of multiple bands of spectral images in which there exist semantically meaningful land cover subregions, co-registered with other source data such as LIDAR (LIght Detection And Ranging) data, where available. This suggests that, in order to account for spatial correlation between pixels, a feature vector associated with each pixel may be a vectorized tensor representing the multiple bands and a local patch as appropriate. Similarly, multiple types of texture features based on a pixel's local patch would also be beneficial for encoding locally statistical information and spatial variations, without necessarily labelling pixel-wise a large amount of ground truth, then training a supervised model, which is sometimes impractical. In this work, by resorting to label only a small quantity of pixels, a new semi-supervised segmentation approach is proposed. Initially, over all pixels, an image data matrix is created in high dimensional feature space. Then, t-SNE projects the high dimensional data onto 3D embedding. By using radial basis functions as input features, which use the labelled data samples as centres, to pair with the output class labels, a modified canonical correlation analysis algorithm, referred to as RBF-CCA, is introduced which learns the associated projection matrix via the small labelled data set. The associated canonical variables, obtained for the full image, are applied by k-means clustering algorithm. The proposed semi-supervised RBF-CCA algorithm has been implemented on several remotely sensed multispectral images, demonstrating excellent segmentation results.
- [1291] arXiv:2401.12170 (cross-list from cs.LO) [ pdf , ps , other ]
-
Title: Natural Strategic Ability in Stochastic Multi-Agent SystemsComments: Extended version of the paper accepted at AAAI 2024Subjects: Logic in Computer Science (cs.LO) ; Artificial Intelligence (cs.AI)
Abstract: Strategies synthesized using formal methods can be complex and often require infinite memory, which does not correspond to the expected behavior when trying to model Multi-Agent Systems (MAS). To capture such behaviors, natural strategies are a recently proposed framework striking a balance between the ability of agents to strategize with memory and the model-checking complexity, but until now has been restricted to fully deterministic settings. For the first time, we consider the probabilistic temporal logics PATL and PATL* under natural strategies (NatPATL and NatPATL*, resp.). As main result we show that, in stochastic MAS, NatPATL model-checking is NP-complete when the active coalition is restricted to deterministic strategies. We also give a 2NEXPTIME complexity result for NatPATL* with the same restriction. In the unrestricted case, we give an EXPSPACE complexity for NatPATL and 3EXPSPACE complexity for NatPATL*.
- [1292] arXiv:2401.12176 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Broiler-Net: A Deep Convolutional Framework for Broiler Behavior Analysis in Poultry HousesComments: 11 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Detecting anomalies in poultry houses is crucial for maintaining optimal chicken health conditions, minimizing economic losses and bolstering profitability. This paper presents a novel real-time framework for analyzing chicken behavior in cage-free poultry houses to detect abnormal behaviors. Specifically, two significant abnormalities, namely inactive broiler and huddling behavior, are investigated in this study. The proposed framework comprises three key steps: (1) chicken detection utilizing a state-of-the-art deep learning model, (2) tracking individual chickens across consecutive frames with a fast tracker module, and (3) detecting abnormal behaviors within the video stream. Experimental studies are conducted to evaluate the efficacy of the proposed algorithm in accurately assessing chicken behavior. The results illustrate that our framework provides a precise and efficient solution for real-time anomaly detection, facilitating timely interventions to maintain chicken health and enhance overall productivity on poultry farms. Github: this https URL
- [1293] arXiv:2401.12178 (cross-list from cs.CL) [ pdf , other ]
-
Title: In-Context Learning for Extreme Multi-Label ClassificationAuthors: Karel D'Oosterlinck , Omar Khattab , François Remy , Thomas Demeester , Chris Develder , Christopher PottsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt. We propose a general program, $\texttt{Infer--Retrieve--Rank}$, that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems. We implement this program using the $\texttt{DSPy}$ programming model, which specifies in-context systems in a declarative manner, and use $\texttt{DSPy}$ optimizers to tune it towards specific datasets by bootstrapping only tens of few-shot examples. Our primary extreme classification program, optimized separately for each task, attains state-of-the-art results across three benchmarks (HOUSE, TECH, TECHWOLF). We apply the same program to a benchmark with vastly different characteristics and attain competitive performance as well (BioDEX). Unlike prior work, our proposed solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples. Our code is public at this https URL .
- [1294] arXiv:2401.12179 (cross-list from cs.SD) [ pdf , other ]
-
Title: DITTO: Diffusion Inference-Time T-Optimization for Music GenerationSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Abstract: We propose Diffusion Inference-Time T-Optimization (DITTO), a general-purpose frame-work for controlling pre-trained text-to-music diffusion models at inference-time via optimizing initial noise latents. Our method can be used to optimize through any differentiable feature matching loss to achieve a target (stylized) output and leverages gradient checkpointing for memory efficiency. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control - all without ever fine-tuning the underlying model. When we compare our approach against related training, guidance, and optimization-based methods, we find DITTO achieves state-of-the-art performance on nearly all tasks, including outperforming comparable approaches on controllability, audio quality, and computational efficiency, thus opening the door for high-quality, flexible, training-free control of diffusion models. Sound examples can be found at this https URL .
- [1295] arXiv:2401.12181 (cross-list from cs.LG) [ pdf , other ]
-
Title: Universal Neurons in GPT2 Language ModelsAuthors: Wes Gurnee , Theo Horsley , Zifan Carl Guo , Tara Rezaei Kheirkhah , Qinyi Sun , Will Hathaway , Neel Nanda , Dimitris BertsimasSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neurons are likely to be interpretable. In particular, we compute pairwise correlations of neuron activations over 100 million tokens for every neuron pair across five different seeds and find that 1-5\% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs. We then study these universal neurons in detail, finding that they usually have clear interpretations and taxonomize them into a small number of neuron families. We conclude by studying patterns in neuron weights to establish several universal functional roles of neurons in simple circuits: deactivating attention heads, changing the entropy of the next token distribution, and predicting the next token to (not) be within a particular set.
- [1296] arXiv:2401.12187 (cross-list from cs.LG) [ pdf , other ]
-
Title: WARM: On the Benefits of Weight Averaged Reward ModelsAuthors: Alexandre Ramé , Nino Vieillard , Léonard Hussenot , Robert Dadashi , Geoffrey Cideron , Olivier Bachem , Johan FerretComments: 14 pages, 9 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Aligning large language models (LLMs) with human preferences through reinforcement learning (RLHF) can lead to reward hacking, where LLMs exploit failures in the reward model (RM) to achieve seemingly high rewards without meeting the underlying objectives. We identify two primary challenges when designing RMs to mitigate reward hacking: distribution shifts during the RL process and inconsistencies in human preferences. As a solution, we propose Weight Averaged Reward Models (WARM), first fine-tuning multiple RMs, then averaging them in the weight space. This strategy follows the observation that fine-tuned weights remain linearly mode connected when sharing the same pre-training. By averaging weights, WARM improves efficiency compared to the traditional ensembling of predictions, while improving reliability under distribution shifts and robustness to preference inconsistencies. Our experiments on summarization tasks, using best-of-N and RL methods, shows that WARM improves the overall quality and alignment of LLM predictions; for example, a policy RL fine-tuned with WARM has a 79.4% win rate against a policy RL fine-tuned with a single RM.
- [1297] arXiv:2401.12192 (cross-list from cs.CL) [ pdf , other ]
-
Title: Text Embedding Inversion Security for Multilingual Language ModelsComments: 18 pages, 17 Tables, 6 FiguresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Textual data is often represented as realnumbered embeddings in NLP, particularly with the popularity of large language models (LLMs) and Embeddings as a Service (EaaS). However, storing sensitive information as embeddings can be vulnerable to security breaches, as research shows that text can be reconstructed from embeddings, even without knowledge of the underlying model. While defence mechanisms have been explored, these are exclusively focused on English, leaving other languages vulnerable to attacks. This work explores LLM security through multilingual embedding inversion. We define the problem of black-box multilingual and cross-lingual inversion attacks, and thoroughly explore their potential implications. Our findings suggest that multilingual LLMs may be more vulnerable to inversion attacks, in part because English based defences may be ineffective. To alleviate this, we propose a simple masking defense effective for both monolingual and multilingual models. This study is the first to investigate multilingual inversion attacks, shedding light on the differences in attacks and defenses across monolingual and multilingual settings.
- [1298] arXiv:2401.12202 (cross-list from cs.RO) [ pdf , other ]
-
Title: OK-Robot: What Really Matters in Integrating Open-Knowledge Models for RoboticsAuthors: Peiqi Liu , Yaswanth Orru , Jay Vakil , Chris Paxton , Nur Muhammad Mahi Shafiullah , Lerrel PintoComments: Github repo: this https URLSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Remarkable progress has been made in recent years in the fields of vision, language, and robotics. We now have vision models capable of recognizing objects based on language queries, navigation systems that can effectively control mobile systems, and grasping models that can handle a wide range of objects. Despite these advancements, general-purpose applications of robotics still lag behind, even though they rely on these fundamental capabilities of recognition, navigation, and grasping. In this paper, we adopt a systems-first approach to develop a new Open Knowledge-based robotics framework called OK-Robot. By combining Vision-Language Models (VLMs) for object detection, navigation primitives for movement, and grasping primitives for object manipulation, OK-Robot offers a integrated solution for pick-and-drop operations without requiring any training. To evaluate its performance, we run OK-Robot in 10 real-world home environments. The results demonstrate that OK-Robot achieves a 58.5% success rate in open-ended pick-and-drop tasks, representing a new state-of-the-art in Open Vocabulary Mobile Manipulation (OVMM) with nearly 1.8x the performance of prior work. On cleaner, uncluttered environments, OK-Robot's performance increases to 82%. However, the most important insight gained from OK-Robot is the critical role of nuanced details when combining Open Knowledge systems like VLMs with robotic modules. Videos of our experiments and code are available on our website: this https URL
- [1299] arXiv:2401.12205 (cross-list from cs.LG) [ pdf , other ]
-
Title: Retrieval-Guided Reinforcement Learning for Boolean Circuit MinimizationComments: Accepted in ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: Logic synthesis, a pivotal stage in chip design, entails optimizing chip specifications encoded in hardware description languages like Verilog into highly efficient implementations using Boolean logic gates. The process involves a sequential application of logic minimization heuristics (``synthesis recipe"), with their arrangement significantly impacting crucial metrics such as area and delay. Addressing the challenge posed by the broad spectrum of design complexities - from variations of past designs (e.g., adders and multipliers) to entirely novel configurations (e.g., innovative processor instructions) - requires a nuanced `synthesis recipe` guided by human expertise and intuition. This study conducts a thorough examination of learning and search techniques for logic synthesis, unearthing a surprising revelation: pre-trained agents, when confronted with entirely novel designs, may veer off course, detrimentally affecting the search trajectory. We present ABC-RL, a meticulously tuned $\alpha$ parameter that adeptly adjusts recommendations from pre-trained agents during the search process. Computed based on similarity scores through nearest neighbor retrieval from the training dataset, ABC-RL yields superior synthesis recipes tailored for a wide array of hardware designs. Our findings showcase substantial enhancements in the Quality-of-result (QoR) of synthesized circuits, boasting improvements of up to 24.8% compared to state-of-the-art techniques. Furthermore, ABC-RL achieves an impressive up to 9x reduction in runtime (iso-QoR) when compared to current state-of-the-art methodologies.
- [1300] arXiv:2401.12223 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: The Global Impact of AI-Artificial Intelligence: Recent Advances and Future Directions, A ReviewAuthors: Chandregowda PachegowdaComments: 4 pagesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Artificial intelligence (AI) is an emerging technology that has the potential to transform many aspects of society, including the economy, healthcare, and transportation. This article synthesizes recent research literature on the global impact of AI, exploring its potential benefits and risks. The article highlights the implications of AI, including its impact on economic, ethical, social, security & privacy, and job displacement aspects. It discusses the ethical concerns surrounding AI development, including issues of bias, security, and privacy violations. To ensure the responsible development and deployment of AI, collaboration between government, industry, and academia is essential. The article concludes by emphasizing the importance of public engagement and education to promote awareness and understanding of AI's impact on society at large.
- [1301] arXiv:2401.12224 (cross-list from cs.AR) [ pdf , other ]
-
Title: LLM4EDA: Emerging Progress in Large Language Models for Electronic Design AutomationAuthors: Ruizhe Zhong , Xingbo Du , Shixiong Kai , Zhentao Tang , Siyuan Xu , Hui-Ling Zhen , Jianye Hao , Qiang Xu , Mingxuan Yuan , Junchi YanComments: 15 pages, 4 figuresSubjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: Driven by Moore's Law, the complexity and scale of modern chip design are increasing rapidly. Electronic Design Automation (EDA) has been widely applied to address the challenges encountered in the full chip design process. However, the evolution of very large-scale integrated circuits has made chip design time-consuming and resource-intensive, requiring substantial prior expert knowledge. Additionally, intermediate human control activities are crucial for seeking optimal solutions. In system design stage, circuits are usually represented with Hardware Description Language (HDL) as a textual format. Recently, Large Language Models (LLMs) have demonstrated their capability in context understanding, logic reasoning and answer generation. Since circuit can be represented with HDL in a textual format, it is reasonable to question whether LLMs can be leveraged in the EDA field to achieve fully automated chip design and generate circuits with improved power, performance, and area (PPA). In this paper, we present a systematic study on the application of LLMs in the EDA field, categorizing it into the following cases: 1) assistant chatbot, 2) HDL and script generation, and 3) HDL verification and analysis. Additionally, we highlight the future research direction, focusing on applying LLMs in logic synthesis, physical design, multi-modal feature extraction and alignment of circuits. We collect relevant papers up-to-date in this field via the following link: this https URL .
- [1302] arXiv:2401.12244 (cross-list from cs.CV) [ pdf , other ]
-
Title: Large-scale Reinforcement Learning for Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Text-to-image diffusion models are a class of deep generative models that have demonstrated an impressive capacity for high-quality image generation. However, these models are susceptible to implicit biases that arise from web-scale text-image training pairs and may inaccurately model aspects of images we care about. This can result in suboptimal samples, model bias, and images that do not align with human ethics and preferences. In this paper, we present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) across a diverse set of reward functions, such as human preference, compositionality, and fairness over millions of images. We illustrate how our approach substantially outperforms existing methods for aligning diffusion models with human preferences. We further illustrate how this substantially improves pretrained Stable Diffusion (SD) models, generating samples that are preferred by humans 80.3% of the time over those from the base SD model while simultaneously improving both the composition and diversity of generated samples.
- [1303] arXiv:2401.12255 (cross-list from cs.CR) [ pdf , other ]
-
Title: Instructional Fingerprinting of Large Language ModelsComments: Accepted at NAACL 2024; 30 pagesSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The exorbitant cost of training Large language models (LLMs) from scratch makes it essential to fingerprint the models to protect intellectual property via ownership authentication and to ensure downstream users and developers comply with their license terms (e.g. restricting commercial use). In this study, we present a pilot study on LLM fingerprinting as a form of very lightweight instruction tuning. Model publisher specifies a confidential private key and implants it as an instruction backdoor that causes the LLM to generate specific text when the key is present. Results on 11 popularly-used LLMs showed that this approach is lightweight and does not affect the normal behavior of the model. It also prevents publisher overclaim, maintains robustness against fingerprint guessing and parameter-efficient training, and supports multi-stage fingerprinting akin to MIT License. Code is available in this https URL .
- [1304] arXiv:2401.12258 (cross-list from cs.MA) [ pdf , other ]
-
Title: Emergent Dominance Hierarchies in Reinforcement Learning AgentsSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Abstract: Modern Reinforcement Learning (RL) algorithms are able to outperform humans in a wide variety of tasks. Multi-agent reinforcement learning (MARL) settings present additional challenges, and successful cooperation in mixed-motive groups of agents depends on a delicate balancing act between individual and group objectives. Social conventions and norms, often inspired by human institutions, are used as tools for striking this balance.
In this paper, we examine a fundamental, well-studied social convention that underlies cooperation in both animal and human societies: dominance hierarchies.
We adapt the ethological theory of dominance hierarchies to artificial agents, borrowing the established terminology and definitions with as few amendments as possible. We demonstrate that populations of RL agents, operating without explicit programming or intrinsic rewards, can invent, learn, enforce, and transmit a dominance hierarchy to new populations. The dominance hierarchies that emerge have a similar structure to those studied in chickens, mice, fish, and other species. - [1305] arXiv:2401.12259 (cross-list from cs.MA) [ pdf , ps , other ]
-
Title: Agreement Technologies for Coordination in Smart CitiesJournal-ref: Applied Sciences, Volume 8, Issue 5 (2018)Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: Many challenges in today's society can be tackled by distributed open systems. This is particularly true for domains that are commonly perceived under the umbrella of smart cities, such as intelligent transportation, smart energy grids, or participative governance. When designing computer applications for these domains, it is necessary to account for the fact that the elements of such systems, often called software agents, are usually made by different designers and act on behalf of particular stakeholders. Furthermore, it is unknown at design time when such agents will enter or leave the system, and what interests new agents will represent. To instil coordination in such systems is particularly demanding, as usually only part of them can be directly controlled at runtime. Agreement technologies refer to a sandbox of tools and mechanisms for the development of such open multiagent systems, which are based on the notion of agreement. In this paper, we argue that agreement technologies are a suitable means for achieving coordination in smart city domains, and back our claim through examples of several real-world applications.
- [1306] arXiv:2401.12261 (cross-list from cs.CR) [ pdf , other ]
-
Title: Analyzing the Quality Attributes of AI Vision Models in Open Repositories Under Adversarial AttacksSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: As AI models rapidly evolve, they are frequently released to open repositories, such as HuggingFace. It is essential to perform quality assurance validation on these models before integrating them into the production development lifecycle. In addition to evaluating efficiency in terms of balanced accuracy and computing costs, adversarial attacks are potential threats to the robustness and explainability of AI models. Meanwhile, XAI applies algorithms that approximate inputs to outputs post-hoc to identify the contributing features. Adversarial perturbations may also degrade the utility of XAI explanations that require further investigation. In this paper, we present an integrated process designed for downstream evaluation tasks, including validating AI model accuracy, evaluating robustness with benchmark perturbations, comparing explanation utility, and assessing overhead. We demonstrate an evaluation scenario involving six computer vision models, which include CNN-based, Transformer-based, and hybrid architectures, three types of perturbations, and five XAI methods, resulting in ninety unique combinations. The process reveals the explanation utility among the XAI methods in terms of the identified key areas responding to the adversarial perturbation. The process produces aggregated results that illustrate multiple attributes of each AI model.
- [1307] arXiv:2401.12273 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: The Ethics of Interaction: Mitigating Security Threats in LLMsSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: This paper comprehensively explores the ethical challenges arising from security threats to Language Learning Models (LLMs). These intricate digital repositories are increasingly integrated into our daily lives, making them prime targets for attacks that can compromise their training data and the confidentiality of their data sources. The paper delves into the nuanced ethical repercussions of such security threats on society and individual privacy. We scrutinize five major threats: prompt injection, jailbreaking, Personal Identifiable Information (PII) exposure, sexually explicit content, and hate based content, going beyond mere identification to assess their critical ethical consequences and the urgency they create for robust defensive strategies. The escalating reliance on LLMs underscores the crucial need for ensuring these systems operate within the bounds of ethical norms, particularly as their misuse can lead to significant societal and individual harm. We propose conceptualizing and developing an evaluative tool tailored for LLMs, which would serve a dual purpose, guiding developers and designers in preemptive fortification of backend systems and scrutinizing the ethical dimensions of LLM chatbot responses during the testing phase. By comparing LLM responses with those expected from humans in a moral context, we aim to discern the degree to which AI behaviors align with the ethical values held by a broader society. Ultimately, this paper not only underscores the ethical troubles presented by LLMs, it also highlights a path toward cultivating trust in these systems.
- [1308] arXiv:2401.12275 (cross-list from cs.RO) [ pdf , other ]
-
Title: Multi-Agent Dynamic Relational Reasoning for Social Robot NavigationComments: 19 pages, 8 figures, 6 tablesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Abstract: Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning. While modeling pairwise relations has been widely studied in multi-agent interacting systems, the ability to capture larger-scale group-wise activities is limited. In this paper, we propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures, and we demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation. In addition to the edges between pairs of nodes (i.e., agents), we propose to infer hyperedges that adaptively connect multiple nodes to enable group-wise reasoning in an unsupervised manner. Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states. Meanwhile, we propose to regularize the sharpness and sparsity of the learned relations and the smoothness of the relation evolution, which proves to enhance training stability and model performance. The proposed approach is validated on synthetic crowd simulations and real-world benchmark datasets. Experiments demonstrate that the approach infers reasonable relations and achieves state-of-the-art prediction performance. In addition, we present a deep reinforcement learning (DRL) framework for social robot navigation, which incorporates relational reasoning and trajectory prediction systematically. In a group-based crowd simulation, our method outperforms the strongest baseline by a significant margin in terms of safety, efficiency, and social compliance in dense, interactive scenarios.
- [1309] arXiv:2401.12292 (cross-list from cs.CL) [ pdf , other ]
-
Title: GRATH: Gradual Self-Truthifying for Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Truthfulness is paramount for large language models (LLMs) as they are increasingly deployed in real-world applications. However, existing LLMs still struggle with generating truthful content, as evidenced by their modest performance on benchmarks like TruthfulQA. To address this issue, we propose GRAdual self-truTHifying (GRATH), a novel post-processing method to enhance truthfulness of LLMs. GRATH utilizes out-of-domain question prompts to generate pairwise truthfulness training data with each pair containing a question and its correct and incorrect answers, and then optimizes the model via direct preference optimization (DPO) to learn from the truthfulness difference between answer pairs. GRATH iteratively refines truthfulness data and updates the model, leading to a gradual improvement in model truthfulness in a self-supervised manner. Empirically, we evaluate GRATH using different 7B-LLMs and compare with LLMs with similar or even larger sizes on benchmark datasets. Our results show that GRATH effectively improves LLMs' truthfulness without compromising other core capabilities. Notably, GRATH achieves state-of-the-art performance on TruthfulQA, with MC1 accuracy of 54.71% and MC2 accuracy of 69.10%, which even surpass those on 70B-LLMs.
- [1310] arXiv:2401.12326 (cross-list from cs.CL) [ pdf , other ]
-
Title: Fine-tuning Large Language Models for Multigenerator, Multidomain, and Multilingual Machine-Generated Text DetectionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: SemEval-2024 Task 8 introduces the challenge of identifying machine-generated texts from diverse Large Language Models (LLMs) in various languages and domains. The task comprises three subtasks: binary classification in monolingual and multilingual (Subtask A), multi-class classification (Subtask B), and mixed text detection (Subtask C). This paper focuses on Subtask A & B. Each subtask is supported by three datasets for training, development, and testing. To tackle this task, two methods: 1) using traditional machine learning (ML) with natural language preprocessing (NLP) for feature extraction, and 2) fine-tuning LLMs for text classification. The results show that transformer models, particularly LoRA-RoBERTa, exceed traditional ML methods in effectiveness, with majority voting being particularly effective in multilingual contexts for identifying machine-generated texts.
- [1311] arXiv:2401.12340 (cross-list from cs.CV) [ pdf , other ]
-
Title: Contrastive Learning and Cycle Consistency-based Transductive Transfer Learning for Target AnnotationComments: This Paper is Accepted in IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS. This Arxiv version is an older version than the reviewed versionSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Abstract: Annotating automatic target recognition (ATR) is a highly challenging task, primarily due to the unavailability of labeled data in the target domain. Hence, it is essential to construct an optimal target domain classifier by utilizing the labeled information of the source domain images. The transductive transfer learning (TTL) method that incorporates a CycleGAN-based unpaired domain translation network has been previously proposed in the literature for effective ATR annotation. Although this method demonstrates great potential for ATR, it severely suffers from lower annotation performance, higher Fréchet Inception Distance (FID) score, and the presence of visual artifacts in the synthetic images. To address these issues, we propose a hybrid contrastive learning base unpaired domain translation (H-CUT) network that achieves a significantly lower FID score. It incorporates both attention and entropy to emphasize the domain-specific region, a noisy feature mixup module to generate high variational synthetic negative patches, and a modulated noise contrastive estimation (MoNCE) loss to reweight all negative patches using optimal transport for better performance. Our proposed contrastive learning and cycle-consistency-based TTL (C3TTL) framework consists of two H-CUT networks and two classifiers. It simultaneously optimizes cycle-consistency, MoNCE, and identity losses. In C3TTL, two H-CUT networks have been employed through a bijection mapping to feed the reconstructed source domain images into a pretrained classifier to guide the optimal target domain classifier. Extensive experimental analysis conducted on three ATR datasets demonstrates that the proposed C3TTL method is effective in annotating civilian and military vehicles, as well as ship targets.
- [1312] arXiv:2401.12344 (cross-list from cs.CV) [ pdf , other ]
-
Title: OCT-SelfNet: A Self-Supervised Framework with Multi-Modal Datasets for Generalized and Robust Retinal Disease DetectionComments: 12 pages, 7 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Despite the revolutionary impact of AI and the development of locally trained algorithms, achieving widespread generalized learning from multi-modal data in medical AI remains a significant challenge. This gap hinders the practical deployment of scalable medical AI solutions. Addressing this challenge, our research contributes a self-supervised robust machine learning framework, OCT-SelfNet, for detecting eye diseases using optical coherence tomography (OCT) images. In this work, various data sets from various institutions are combined enabling a more comprehensive range of representation. Our method addresses the issue using a two-phase training approach that combines self-supervised pretraining and supervised fine-tuning with a mask autoencoder based on the SwinV2 backbone by providing a solution for real-world clinical deployment. Extensive experiments on three datasets with different encoder backbones, low data settings, unseen data settings, and the effect of augmentation show that our method outperforms the baseline model, Resnet-50 by consistently attaining AUC-ROC performance surpassing 77% across all tests, whereas the baseline model exceeds 54%. Moreover, in terms of the AUC-PR metric, our proposed method exceeded 42%, showcasing a substantial increase of at least 10% in performance compared to the baseline, which exceeded only 33%. This contributes to our understanding of our approach's potential and emphasizes its usefulness in clinical settings.
- [1313] arXiv:2401.12375 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Development of an NLP-driven computer-based test guide for visually impaired studentsComments: 10 pages, 6 figuresJournal-ref: International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE) Vol. 12, Issue 9, September 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, advancements in Natural Language Processing (NLP) techniques have revolutionized the field of accessibility and exclusivity of testing, particularly for visually impaired students (VIS). CBT has shown in years back its relevance in terms of administering exams electronically, making the test process easier, providing quicker and more accurate results, and offering greater flexibility and accessibility for candidates. Yet, its relevance was not felt by the visually impaired students as they cannot access printed documents. Hence, in this paper, we present an NLP-driven Computer-Based Test guide for visually impaired students. It employs a speech technology pre-trained methods to provide real-time assistance and support to visually impaired students. The system utilizes NLP technologies to convert the text-based questions and the associated options in a machine-readable format. Subsequently, the speech technology pre-trained model processes the converted text enabling the VIS to comprehend and analyze the content. Furthermore, we validated that this pre-trained model is not perverse by testing for accuracy using sample audio datasets labels (A, B, C, D, E, F, G) to compare with the voice recordings obtained from 20 VIS which is been predicted by the system to attain values for precision, recall, and F1-scores. These metrics are used to assess the performance of the pre-trained model and have indicated that it is proficient enough to give its better performance to the evaluated system. The methodology adopted for this system is Object Oriented Analysis and Design Methodology (OOADM) where Objects are discussed and built by modeling real-world instances.
- [1314] arXiv:2401.12392 (cross-list from cs.RO) [ pdf , other ]
-
Title: Evaluating Roadside Perception for Autonomous Vehicles: Insights from Field TestingAuthors: Rusheng Zhang , Depu Meng , Shengyin Shen , Tinghan Wang , Tai Karir , Michael Maile , Henry X. LiuComments: 6 figures, 8 tables, 14 pagesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Roadside perception systems are increasingly crucial in enhancing traffic safety and facilitating cooperative driving for autonomous vehicles. Despite rapid technological advancements, a major challenge persists for this newly arising field: the absence of standardized evaluation methods and benchmarks for these systems. This limitation hampers the ability to effectively assess and compare the performance of different systems, thus constraining progress in this vital field. This paper introduces a comprehensive evaluation methodology specifically designed to assess the performance of roadside perception systems. Our methodology encompasses measurement techniques, metric selection, and experimental trial design, all grounded in real-world field testing to ensure the practical applicability of our approach.
We applied our methodology in Mcity\footnote{\url{ this https URL }}, a controlled testing environment, to evaluate various off-the-shelf perception systems. This approach allowed for an in-depth comparative analysis of their performance in realistic scenarios, offering key insights into their respective strengths and limitations. The findings of this study are poised to inform the development of industry-standard benchmarks and evaluation methods, thereby enhancing the effectiveness of roadside perception system development and deployment for autonomous vehicles. We anticipate that this paper will stimulate essential discourse on standardizing evaluation methods for roadside perception systems, thus pushing the frontiers of this technology. Furthermore, our results offer both academia and industry a comprehensive understanding of the capabilities of contemporary infrastructure-based perception systems. - [1315] arXiv:2401.12393 (cross-list from cs.DB) [ pdf , other ]
-
Title: A Learning-based Declarative Privacy-Preserving Framework for Federated Data ManagementAuthors: Hong Guan , Summer Gautier , Deepti Gupta , Rajan Hari Ambrish , Yancheng Wang , Harsha Lakamsani , Dhanush Giriyan , Saajan Maslanka , Chaowei Xiao , Yingzhen Yang , Jia ZouSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI)
Abstract: It is challenging to balance the privacy and accuracy for federated query processing over multiple private data silos. In this work, we will demonstrate an end-to-end workflow for automating an emerging privacy-preserving technique that uses a deep learning model trained using the Differentially-Private Stochastic Gradient Descent (DP-SGD) algorithm to replace portions of actual data to answer a query. Our proposed novel declarative privacy-preserving workflow allows users to specify "what private information to protect" rather than "how to protect". Under the hood, the system automatically chooses query-model transformation plans as well as hyper-parameters. At the same time, the proposed workflow also allows human experts to review and tune the selected privacy-preserving mechanism for audit/compliance, and optimization purposes.
- [1316] arXiv:2401.12406 (cross-list from cs.CL) [ pdf , other ]
-
Title: Enhancing In-context Learning via Linear Probe CalibrationAuthors: Momin Abbas , Yi Zhou , Parikshit Ram , Nathalie Baracaldo , Horst Samulowitz , Theodoros Salonidis , Tianyi ChenComments: Accepted at AISTATS2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In-context learning (ICL) is a new paradigm for natural language processing that utilizes Generative Pre-trained Transformer (GPT)-like models. This approach uses prompts that include in-context demonstrations to generate the corresponding output for a new query input. However, applying ICL in real cases does not scale with the number of samples, and lacks robustness to different prompt templates and demonstration permutations. In this paper, we first show that GPT-like models using ICL result in unreliable predictions based on a new metric based on Shannon entropy. Then, to solve this problem, we propose a new technique called the Linear Probe Calibration (LinC), a method that calibrates the model's output probabilities, resulting in reliable predictions and improved performance, while requiring only minimal additional samples (as few as five labeled data samples). LinC significantly enhances the ICL test performance of GPT models on various benchmark datasets, with an average improvement of up to 21%, and up to a 50% improvement in some cases, and significantly boosts the performance of PEFT methods, especially in the low resource regime. Moreover, LinC achieves lower expected calibration error, and is highly robust to varying label proportions, prompt templates, and demonstration permutations. Our code is available at \url{ this https URL }.
- [1317] arXiv:2401.12421 (cross-list from cs.CV) [ pdf , other ]
-
Title: AdaEmbed: Semi-supervised Domain Adaptation in the Embedding SpaceSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Semi-supervised domain adaptation (SSDA) presents a critical hurdle in computer vision, especially given the frequent scarcity of labeled data in real-world settings. This scarcity often causes foundation models, trained on extensive datasets, to underperform when applied to new domains. AdaEmbed, our newly proposed methodology for SSDA, offers a promising solution to these challenges. Leveraging the potential of unlabeled data, AdaEmbed facilitates the transfer of knowledge from a labeled source domain to an unlabeled target domain by learning a shared embedding space. By generating accurate and uniform pseudo-labels based on the established embedding space, the model overcomes the limitations of conventional SSDA, thus enhancing performance significantly. Our method's effectiveness is validated through extensive experiments on benchmark datasets such as DomainNet, Office-Home, and VisDA-C, where AdaEmbed consistently outperforms all the baselines, setting a new state of the art for SSDA. With its straightforward implementation and high data efficiency, AdaEmbed stands out as a robust and pragmatic solution for real-world scenarios, where labeled data is scarce. To foster further research and application in this area, we are sharing the codebase of our unified framework for semi-supervised domain adaptation.
- [1318] arXiv:2401.12451 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Methods and strategies for improving the novel view synthesis quality of neural radiation fieldJournal-ref: IEEE ACCESS 12 (2024) 50548-50555Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Neural Radiation Field (NeRF) technology can learn a 3D implicit model of a scene from 2D images and synthesize realistic novel view images. This technology has received widespread attention from the industry and has good application prospects. In response to the problem that the rendering quality of NeRF images needs to be improved, many researchers have proposed various methods to improve the rendering quality in the past three years. The latest relevant papers are classified and reviewed, the technical principles behind quality improvement are analyzed, and the future evolution direction of quality improvement methods is discussed. This study can help researchers quickly understand the current state and evolutionary context of technology in this field, which is helpful in inspiring the development of more efficient algorithms and promoting the application of NeRF technology in related fields.
- [1319] arXiv:2401.12455 (cross-list from cs.MA) [ pdf , other ]
-
Title: Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure managementSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: We present a multi-agent Deep Reinforcement Learning (DRL) framework for managing large transportation infrastructure systems over their life-cycle. Life-cycle management of such engineering systems is a computationally intensive task, requiring appropriate sequential inspection and maintenance decisions able to reduce long-term risks and costs, while dealing with different uncertainties and constraints that lie in high-dimensional spaces. To date, static age- or condition-based maintenance methods and risk-based or periodic inspection plans have mostly addressed this class of optimization problems. However, optimality, scalability, and uncertainty limitations are often manifested under such approaches. The optimization problem in this work is cast in the framework of constrained Partially Observable Markov Decision Processes (POMDPs), which provides a comprehensive mathematical basis for stochastic sequential decision settings with observation uncertainties, risk considerations, and limited resources. To address significantly large state and action spaces, a Deep Decentralized Multi-agent Actor-Critic (DDMAC) DRL method with Centralized Training and Decentralized Execution (CTDE), termed as DDMAC-CTDE is developed. The performance strengths of the DDMAC-CTDE method are demonstrated in a generally representative and realistic example application of an existing transportation network in Virginia, USA. The network includes several bridge and pavement components with nonstationary degradation, agency-imposed constraints, and traffic delay and risk considerations. Compared to traditional management policies for transportation networks, the proposed DDMAC-CTDE method vastly outperforms its counterparts. Overall, the proposed algorithmic framework provides near optimal solutions for transportation infrastructure management under real-world constraints and complexities.
- [1320] arXiv:2401.12456 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Exploration and Improvement of Nerf-based 3D Scene Editing TechniquesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: NeRF's high-quality scene synthesis capability was quickly accepted by scholars in the years after it was proposed, and significant progress has been made in 3D scene representation and synthesis. However, the high computational cost limits intuitive and efficient editing of scenes, making NeRF's development in the scene editing field facing many challenges. This paper reviews the preliminary explorations of scholars on NeRF in the scene or object editing field in recent years, mainly changing the shape and texture of scenes or objects in new synthesized scenes; through the combination of residual models such as GaN and Transformer with NeRF, the generalization ability of NeRF scene editing has been further expanded, including realizing real-time new perspective editing feedback, multimodal editing of text synthesized 3D scenes, 4D synthesis performance, and in-depth exploration in light and shadow editing, initially achieving optimization of indirect touch editing and detail representation in complex scenes. Currently, most NeRF editing methods focus on the touch points and materials of indirect points, but when dealing with more complex or larger 3D scenes, it is difficult to balance accuracy, breadth, efficiency, and quality. Overcoming these challenges may become the direction of future NeRF 3D scene editing technology.
- [1321] arXiv:2401.12461 (cross-list from cs.CL) [ pdf , other ]
-
Title: Fast Adversarial Training against Textual Adversarial AttacksComments: 4 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Many adversarial defense methods have been proposed to enhance the adversarial robustness of natural language processing models. However, most of them introduce additional pre-set linguistic knowledge and assume that the synonym candidates used by attackers are accessible, which is an ideal assumption. We delve into adversarial training in the embedding space and propose a Fast Adversarial Training (FAT) method to improve the model robustness in the synonym-unaware scenario from the perspective of single-step perturbation generation and perturbation initialization. Based on the observation that the adversarial perturbations crafted by single-step and multi-step gradient ascent are similar, FAT uses single-step gradient ascent to craft adversarial examples in the embedding space to expedite the training process. Based on the observation that the perturbations generated on the identical training sample in successive epochs are similar, FAT fully utilizes historical information when initializing the perturbation. Extensive experiments demonstrate that FAT significantly boosts the robustness of BERT models in the synonym-unaware scenario, and outperforms the defense baselines under various attacks with character-level and word-level modifications.
- [1322] arXiv:2401.12470 (cross-list from cs.LG) [ pdf , other ]
-
Title: Reinforcement Learning for Graph Coloring: Understanding the Power and Limits of Non-Label Invariant RepresentationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Register allocation is one of the most important problems for modern compilers. With a practically unlimited number of user variables and a small number of CPU registers, assigning variables to registers without conflicts is a complex task. This work demonstrates the use of casting the register allocation problem as a graph coloring problem. Using technologies such as PyTorch and OpenAI Gymnasium Environments we will show that a Proximal Policy Optimization model can learn to solve the graph coloring problem. We will also show that the labeling of a graph is critical to the performance of the model by taking the matrix representation of a graph and permuting it. We then test the model's effectiveness on each of these permutations and show that it is not effective when given a relabeling of the same graph. Our main contribution lies in showing the need for label reordering invariant representations of graphs for machine learning models to achieve consistent performance.
- [1323] arXiv:2401.12478 (cross-list from cs.LG) [ pdf , other ]
-
Title: Mini-batch Submodular MaximizationAuthors: Gregory SchwartzmanSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
Abstract: We present the first mini-batch algorithm for maximizing a non-negative monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of constraints. We improve over the sparsifier based approach both in theory and in practice. We experimentally observe that our algorithm generates solutions that are far superior to those generated by the sparsifier based approach.
- [1324] arXiv:2401.12485 (cross-list from cs.LG) [ pdf , other ]
-
Title: Adiabatic Quantum Support Vector MachinesAuthors: Prasanna Date , Dong Jun Woun , Kathleen Hamilton , Eduardo A. Coello Perez , Mayanka Chandra Shekhar , Francisco Rios , John Gounley , In-Saeng Suh , Travis Humble , Georgia TourassiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Quantum Physics (quant-ph); Machine Learning (stat.ML)
Abstract: Adiabatic quantum computers can solve difficult optimization problems (e.g., the quadratic unconstrained binary optimization problem), and they seem well suited to train machine learning models. In this paper, we describe an adiabatic quantum approach for training support vector machines. We show that the time complexity of our quantum approach is an order of magnitude better than the classical approach. Next, we compare the test accuracy of our quantum approach against a classical approach that uses the Scikit-learn library in Python across five benchmark datasets (Iris, Wisconsin Breast Cancer (WBC), Wine, Digits, and Lambeq). We show that our quantum approach obtains accuracies on par with the classical approach. Finally, we perform a scalability study in which we compute the total training times of the quantum approach and the classical approach with increasing number of features and number of data points in the training dataset. Our scalability results show that the quantum approach obtains a 3.5--4.5 times speedup over the classical approach on datasets with many (millions of) features.
- [1325] arXiv:2401.12489 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Unsupervised Learning Method for the Wave Equation Based on Finite Difference Residual Constraints LossComments: in Chinese languageSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The wave equation is an important physical partial differential equation, and in recent years, deep learning has shown promise in accelerating or replacing traditional numerical methods for solving it. However, existing deep learning methods suffer from high data acquisition costs, low training efficiency, and insufficient generalization capability for boundary conditions. To address these issues, this paper proposes an unsupervised learning method for the wave equation based on finite difference residual constraints. We construct a novel finite difference residual constraint based on structured grids and finite difference methods, as well as an unsupervised training strategy, enabling convolutional neural networks to train without data and predict the forward propagation process of waves. Experimental results show that finite difference residual constraints have advantages over physics-informed neural networks (PINNs) type physical information constraints, such as easier fitting, lower computational costs, and stronger source term generalization capability, making our method more efficient in training and potent in application.
- [1326] arXiv:2401.12491 (cross-list from cs.CL) [ pdf , other ]
-
Title: Assessing and Understanding Creativity in Large Language ModelsAuthors: Yunpu Zhao , Rui Zhang , Wenyi Li , Di Huang , Jiaming Guo , Shaohui Peng , Yifan Hao , Yuanbo Wen , Xing Hu , Zidong Du , Qi Guo , Ling Li , Yunji ChenSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In the field of natural language processing, the rapid development of large language model (LLM) has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy and efficiency. This paper aims to establish an efficient framework for assessing the level of creativity in LLMs. By adapting the modified Torrance Tests of Creative Thinking, the research evaluates the creative performance of various LLMs across 7 tasks, emphasizing 4 criteria including Fluency, Flexibility, Originality, and Elaboration. In this context, we develop a comprehensive dataset of 700 questions for testing and an LLM-based evaluation method. In addition, this study presents a novel analysis of LLMs' responses to diverse prompts and role-play situations. We found that the creativity of LLMs primarily falls short in originality, while excelling in elaboration. Besides, the use of prompts and the role-play settings of the model significantly influence creativity. Additionally, the experimental results also indicate that collaboration among multiple LLMs can enhance originality. Notably, our findings reveal a consensus between human evaluations and LLMs regarding the personality traits that influence creativity. The findings underscore the significant impact of LLM design on creativity and bridges artificial intelligence and human creativity, offering insights into LLMs' creativity and potential applications.
- [1327] arXiv:2401.12492 (cross-list from cs.CL) [ pdf , other ]
-
Title: Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Incorporating human context into language models is the next frontier for human-centered natural language processing. Currently, two pre-training methods exist: group-wise attributes (e.g., over-45-year-olds) or individual traits. Group attributes are coarse -- not all 45-year-olds write the same way -- while modeling individual traits allows for a more personalized representation, but requires more complex modeling and data. So far, it is unclear which pre-training approach benefits what tasks. We compare pre-training models with human context via 1) group attributes, 2) individual users, and 3) a combined approach on 5 user- and document-level tasks. We find that pre-training with both group and individual features significantly improves the two user-level regression tasks like age estimation and personality assessment. Pre-training on individual users significantly improves the three document-level classification tasks like stance and topic detection. It even does well for downstream tasks without historical user data. Our results suggest both approaches have specific use cases, opening new avenues for human-centered language modeling.
- [1328] arXiv:2401.12513 (cross-list from cs.CV) [ pdf , other ]
-
Title: Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLRSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Purpose: The capacity to isolate and recognize individual characters from facsimile images of papyrus manuscripts yields rich opportunities for digital analysis. For this reason the `ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri' was held as part of the 17th International Conference on Document Analysis and Recognition. This paper discusses our submission to the competition.
Methods: We used an ensemble of YOLOv8 models to detect and classify individual characters and employed two different approaches for refining the character predictions, including a transformer based DeiT approach and a ResNet-50 model trained on a large corpus of unlabelled data using SimCLR, a self-supervised learning method.
Results: Our submission won the recognition challenge with a mAP of 42.2%, and was runner-up in the detection challenge with a mean average precision (mAP) of 51.4%. At the more relaxed intersection over union threshold of 0.5, we achieved the highest mean average precision and mean average recall results for both detection and classification.
Conclusion: The results demonstrate the potential for these techniques for automated character recognition on historical manuscripts. We ran the prediction pipeline on more than 4,500 images from the Oxyrhynchus Papyri to illustrate the utility of our approach, and we release the results publicly in multiple formats. - [1329] arXiv:2401.12522 (cross-list from cs.CL) [ pdf , other ]
-
Title: BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language ModelsComments: An appendix has been included. Source code at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) commonly employ autoregressive generation during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present Bi-directional Tuning for lossless Acceleration (BiTA), an innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification. Inspired by the concept of prompt tuning, we enhance LLMs with a parameter-efficient design called bi-directional tuning for the capability in semi-autoregressive generation. Employing efficient tree-based decoding, the models perform draft candidate generation and verification in parallel, ensuring outputs identical to their autoregressive counterparts under greedy sampling. BiTA serves as a lightweight plug-in module, seamlessly boosting the inference efficiency of existing LLMs without requiring additional assistance models or incurring significant extra memory costs. Applying the proposed BiTA, LLaMA-2-70B-Chat achieves a 2.7$\times$ speedup on the MT-Bench benchmark. Extensive experiments confirm our method surpasses state-of-the-art acceleration techniques.
- [1330] arXiv:2401.12532 (cross-list from cs.LG) [ pdf , other ]
-
Title: DAFA: Distance-Aware Fair Adversarial TrainingComments: Accepted to ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The disparity in accuracy between classes in standard training is amplified during adversarial training, a phenomenon termed the robust fairness problem. Existing methodologies aimed to enhance robust fairness by sacrificing the model's performance on easier classes in order to improve its performance on harder ones. However, we observe that under adversarial attacks, the majority of the model's predictions for samples from the worst class are biased towards classes similar to the worst class, rather than towards the easy classes. Through theoretical and empirical analysis, we demonstrate that robust fairness deteriorates as the distance between classes decreases. Motivated by these insights, we introduce the Distance-Aware Fair Adversarial training (DAFA) methodology, which addresses robust fairness by taking into account the similarities between classes. Specifically, our method assigns distinct loss weights and adversarial margins to each class and adjusts them to encourage a trade-off in robustness among similar classes. Experimental results across various datasets demonstrate that our method not only maintains average robust accuracy but also significantly improves the worst robust accuracy, indicating a marked improvement in robust fairness compared to existing methods.
- [1331] arXiv:2401.12533 (cross-list from cs.LG) [ pdf , other ]
-
Title: Near-Optimal Algorithms for Constrained k-Center Clustering with Instance-level Background KnowledgeSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Center-based clustering has attracted significant research interest from both theory and practice. In many practical applications, input data often contain background knowledge that can be used to improve clustering results. In this work, we build on widely adopted $k$-center clustering and model its input background knowledge as must-link (ML) and cannot-link (CL) constraint sets. However, most clustering problems including $k$-center are inherently $\mathcal{NP}$-hard, while the more complex constrained variants are known to suffer severer approximation and computation barriers that significantly limit their applicability. By employing a suite of techniques including reverse dominating sets, linear programming (LP) integral polyhedron, and LP duality, we arrive at the first efficient approximation algorithm for constrained $k$-center with the best possible ratio of 2. We also construct competitive baseline algorithms and empirically evaluate our approximation algorithm against them on a variety of real datasets. The results validate our theoretical findings and demonstrate the great advantages of our algorithm in terms of clustering cost, clustering quality, and running time.
- [1332] arXiv:2401.12554 (cross-list from cs.DC) [ pdf , other ]
-
Title: Can Large Language Models Write Parallel Code?Subjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: Large language models are increasingly becoming a popular tool for software development. Their ability to model and generate source code has been demonstrated in a variety of contexts, including code completion, summarization, translation, and lookup. However, they often struggle to generate code for complex programs. In this paper, we study the capabilities of state-of-the-art language models to generate parallel code. In order to evaluate language models, we create a benchmark, ParEval, consisting of prompts that represent 420 different coding tasks. We use ParEval to evaluate the effectiveness of several state-of-the-art open- and closed-source language models on these tasks. We introduce novel metrics for evaluating the performance of generated code, and use them to explore how well each LLM performs for 12 different computational problem types and six different parallel programming models.
- [1333] arXiv:2401.12576 (cross-list from cs.CL) [ pdf , other ]
-
Title: LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-ExplanationsAuthors: Qianli Wang , Tatiana Anikina , Nils Feldhus , Josef van Genabith , Leonhard Hennig , Sebastian MöllerComments: Accepted to NAACL 2024 HCI+NLP workshop; camera-ready versionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Interpretability tools that offer explanations in the form of a dialogue have demonstrated their efficacy in enhancing users' understanding (Slack et al., 2023; Shen et al., 2023), as one-off explanations may fall short in providing sufficient information to the user. Current solutions for dialogue-based explanations, however, often require external tools and modules and are not easily transferable to tasks they were not designed for. With LLMCheckup, we present an easily accessible tool that allows users to chat with any state-of-the-art large language model (LLM) about its behavior. We enable LLMs to generate explanations and perform user intent recognition without fine-tuning, by connecting them with a broad spectrum of Explainable AI (XAI) methods, including white-box explainability tools such as feature attributions, and self-explanations (e.g., for rationale generation). LLM-based (self-)explanations are presented as an interactive dialogue that supports follow-up questions and generates suggestions. LLMCheckupprovides tutorials for operations available in the system, catering to individuals with varying levels of expertise in XAI and supporting multiple input modalities. We introduce a new parsing strategy that substantially enhances the user intent recognition accuracy of the LLM. Finally, we showcase LLMCheckup for the tasks of fact checking and commonsense question answering.
- [1334] arXiv:2401.12593 (cross-list from cs.IR) [ pdf , other ]
-
Title: MOReGIn: Multi-Objective Recommendation at the Global and Individual LevelsSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Multi-Objective Recommender Systems (MORSs) emerged as a paradigm to guarantee multiple (often conflicting) goals. Besides accuracy, a MORS can operate at the global level, where additional beyond-accuracy goals are met for the system as a whole, or at the individual level, meaning that the recommendations are tailored to the needs of each user. The state-of-the-art MORSs either operate at the global or individual level, without assuming the co-existence of the two perspectives. In this study, we show that when global and individual objectives co-exist, MORSs are not able to meet both types of goals. To overcome this issue, we present an approach that regulates the recommendation lists so as to guarantee both global and individual perspectives, while preserving its effectiveness. Specifically, as individual perspective, we tackle genre calibration and, as global perspective, provider fairness. We validate our approach on two real-world datasets, publicly released with this paper.
- [1335] arXiv:2401.12631 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Reply to Makelov et al. (2023)'s "Interpretability Illusion" ArgumentsAuthors: Zhengxuan Wu , Atticus Geiger , Jing Huang , Aryaman Arora , Thomas Icard , Christopher Potts , Noah D. GoodmanComments: 20 pages, 14 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions". We first review Makelov et al. (2023)'s technical notion of what an "interpretability illusion" is, and then we show that even intuitive and desirable explanations can qualify as illusions in this sense. As a result, their method of discovering "illusions" can reject explanations they consider "non-illusory". We then argue that the illusions Makelov et al. (2023) see in practice are artifacts of their training and evaluation paradigms. We close by emphasizing that, though we disagree with their core characterization, Makelov et al. (2023)'s examples and discussion have undoubtedly pushed the field of interpretability forward.
- [1336] arXiv:2401.12632 (cross-list from cs.SE) [ pdf , other ]
-
Title: Modeling Resilience of Collaborative AI SystemsComments: This paper is accepted at the 3rd International Conference on AI Engineering - Software Engineering for AI (CAIN 2024), Lisbon, PortugalSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: A Collaborative Artificial Intelligence System (CAIS) performs actions in collaboration with the human to achieve a common goal. CAISs can use a trained AI model to control human-system interaction, or they can use human interaction to dynamically learn from humans in an online fashion. In online learning with human feedback, the AI model evolves by monitoring human interaction through the system sensors in the learning state, and actuates the autonomous components of the CAIS based on the learning in the operational state. Therefore, any disruptive event affecting these sensors may affect the AI model's ability to make accurate decisions and degrade the CAIS performance. Consequently, it is of paramount importance for CAIS managers to be able to automatically track the system performance to understand the resilience of the CAIS upon such disruptive events. In this paper, we provide a new framework to model CAIS performance when the system experiences a disruptive event. With our framework, we introduce a model of performance evolution of CAIS. The model is equipped with a set of measures that aim to support CAIS managers in the decision process to achieve the required resilience of the system. We tested our framework on a real-world case study of a robot collaborating online with the human, when the system is experiencing a disruptive event. The case study shows that our framework can be adopted in CAIS and integrated into the online execution of the CAIS activities.
- [1337] arXiv:2401.12646 (cross-list from cs.MA) [ pdf , other ]
-
Title: Emergent Cooperation under Uncertain Incentive AlignmentSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Abstract: Understanding the emergence of cooperation in systems of computational agents is crucial for the development of effective cooperative AI. Interaction among individuals in real-world settings are often sparse and occur within a broad spectrum of incentives, which often are only partially known. In this work, we explore how cooperation can arise among reinforcement learning agents in scenarios characterised by infrequent encounters, and where agents face uncertainty about the alignment of their incentives with those of others. To do so, we train the agents under a wide spectrum of environments ranging from fully competitive, to fully cooperative, to mixed-motives. Under this type of uncertainty we study the effects of mechanisms, such as reputation and intrinsic rewards, that have been proposed in the literature to foster cooperation in mixed-motives environments. Our findings show that uncertainty substantially lowers the agents' ability to engage in cooperative behaviour, when that would be the best course of action. In this scenario, the use of effective reputation mechanisms and intrinsic rewards boosts the agents' capability to act nearly-optimally in cooperative environments, while greatly enhancing cooperation in mixed-motive environments as well.
- [1338] arXiv:2401.12662 (cross-list from cs.RO) [ pdf , other ]
-
Title: Integrating Human Expertise in Continuous Spaces: A Novel Interactive Bayesian Optimization Framework with Preference Expected ImprovementSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: Interactive Machine Learning (IML) seeks to integrate human expertise into machine learning processes. However, most existing algorithms cannot be applied to Realworld Scenarios because their state spaces and/or action spaces are limited to discrete values. Furthermore, the interaction of all existing methods is restricted to deciding between multiple proposals. We therefore propose a novel framework based on Bayesian Optimization (BO). Interactive Bayesian Optimization (IBO) enables collaboration between machine learning algorithms and humans. This framework captures user preferences and provides an interface for users to shape the strategy by hand. Additionally, we've incorporated a new acquisition function, Preference Expected Improvement (PEI), to refine the system's efficiency using a probabilistic model of the user preferences. Our approach is geared towards ensuring that machines can benefit from human expertise, aiming for a more aligned and effective learning process. In the course of this work, we applied our method to simulations and in a real world task using a Franka Panda robot to show human-robot collaboration.
- [1339] arXiv:2401.12665 (cross-list from cs.CV) [ pdf , other ]
-
Title: ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly SegmentationComments: 17 pages,17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recently, foundational models such as CLIP and SAM have shown promising performance for the task of Zero-Shot Anomaly Segmentation (ZSAS). However, either CLIP-based or SAM-based ZSAS methods still suffer from non-negligible key drawbacks: 1) CLIP primarily focuses on global feature alignment across different inputs, leading to imprecise segmentation of local anomalous parts; 2) SAM tends to generate numerous redundant masks without proper prompt constraints, resulting in complex post-processing requirements. In this work, we innovatively propose a CLIP and SAM collaboration framework called ClipSAM for ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding capability for anomaly localization and rough segmentation, which is further used as the prompt constraints for SAM to refine the anomaly segmentation results. In details, we introduce a crucial Unified Multi-scale Cross-modal Interaction (UMCI) module for interacting language with visual features at multiple scales of CLIP to reason anomaly positions. Then, we design a novel Multi-level Mask Refinement (MMR) module, which utilizes the positional information as multi-level prompts for SAM to acquire hierarchical levels of masks and merges them. Extensive experiments validate the effectiveness of our approach, achieving the optimal segmentation performance on the MVTec-AD and VisA datasets.
- [1340] arXiv:2401.12681 (cross-list from cs.LG) [ pdf , other ]
-
Title: Non-Neighbors Also Matter to Kriging: A New Contrastive-Prototypical LearningComments: Accepted in AISTATS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Kriging aims at estimating the attributes of unsampled geo-locations from observations in the spatial vicinity or physical connections, which helps mitigate skewed monitoring caused by under-deployed sensors. Existing works assume that neighbors' information offers the basis for estimating the attributes of the unobserved target while ignoring non-neighbors. However, non-neighbors could also offer constructive information, and neighbors could also be misleading. To this end, we propose ``Contrastive-Prototypical'' self-supervised learning for Kriging (KCP) to refine valuable information from neighbors and recycle the one from non-neighbors. As a pre-trained paradigm, we conduct the Kriging task from a new perspective of representation: we aim to first learn robust and general representations and then recover attributes from representations. A neighboring contrastive module is designed that coarsely learns the representations by narrowing the representation distance between the target and its neighbors while pushing away the non-neighbors. In parallel, a prototypical module is introduced to identify similar representations via exchanged prediction, thus refining the misleading neighbors and recycling the useful non-neighbors from the neighboring contrast component. As a result, not all the neighbors and some of the non-neighbors will be used to infer the target. To encourage the two modules above to learn general and robust representations, we design an adaptive augmentation module that incorporates data-driven attribute augmentation and centrality-based topology augmentation over the spatiotemporal Kriging graph data. Extensive experiments on real-world datasets demonstrate the superior performance of KCP compared to its peers with 6% improvements and exceptional transferability and robustness. The code is available at this https URL
- [1341] arXiv:2401.12686 (cross-list from cs.MA) [ pdf , other ]
-
Title: Learning Mean Field Games on Sparse Graphs: A Hybrid Graphex ApproachComments: accepted at ICLR 2024Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Abstract: Learning the behavior of large agent populations is an important task for numerous research areas. Although the field of multi-agent reinforcement learning (MARL) has made significant progress towards solving these systems, solutions for many agents often remain computationally infeasible and lack theoretical guarantees. Mean Field Games (MFGs) address both of these issues and can be extended to Graphon MFGs (GMFGs) to include network structures between agents. Despite their merits, the real world applicability of GMFGs is limited by the fact that graphons only capture dense graphs. Since most empirically observed networks show some degree of sparsity, such as power law graphs, the GMFG framework is insufficient for capturing these network topologies. Thus, we introduce the novel concept of Graphex MFGs (GXMFGs) which builds on the graph theoretical concept of graphexes. Graphexes are the limiting objects to sparse graph sequences that also have other desirable features such as the small world property. Learning equilibria in these games is challenging due to the rich and sparse structure of the underlying graphs. To tackle these challenges, we design a new learning algorithm tailored to the GXMFG setup. This hybrid graphex learning approach leverages that the system mainly consists of a highly connected core and a sparse periphery. After defining the system and providing a theoretical analysis, we state our learning approach and demonstrate its learning capabilities on both synthetic graphs and real-world networks. This comparison shows that our GXMFG learning algorithm successfully extends MFGs to a highly relevant class of hard, realistic learning problems that are not accurately addressed by current MARL and MFG methods.
- [1342] arXiv:2401.12689 (cross-list from cs.LG) [ pdf , other ]
-
Title: Energy-based Automated Model EvaluationComments: ICLR2024 poster paperSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: The conventional evaluation protocols on machine learning models rely heavily on a labeled, i.i.d-assumed testing dataset, which is not often present in real world applications. The Automated Model Evaluation (AutoEval) shows an alternative to this traditional workflow, by forming a proximal prediction pipeline of the testing performance without the presence of ground-truth labels. Despite its recent successes, the AutoEval frameworks still suffer from an overconfidence issue, substantial storage and computational cost. In that regard, we propose a novel measure -- Meta-Distribution Energy (MDE) -- that allows the AutoEval framework to be both more efficient and effective. The core of the MDE is to establish a meta-distribution statistic, on the information (energy) associated with individual samples, then offer a smoother representation enabled by energy-based learning. We further provide our theoretical insights by connecting the MDE with the classification loss. We provide extensive experiments across modalities, datasets and different architectural backbones to validate MDE's validity, together with its superiority compared with prior approaches. We also prove MDE's versatility by showing its seamless integration with large-scale models, and easy adaption to learning scenarios with noisy- or imbalanced- labels. Code and data are available: this https URL
- [1343] arXiv:2401.12708 (cross-list from cs.LG) [ pdf , other ]
-
Title: Deep Neural Network Benchmarks for Selective ClassificationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: With the increasing deployment of machine learning models in many socially-sensitive tasks, there is a growing demand for reliable and trustworthy predictions. One way to accomplish these requirements is to allow a model to abstain from making a prediction when there is a high risk of making an error. This requires adding a selection mechanism to the model, which selects those examples for which the model will provide a prediction. The selective classification framework aims to design a mechanism that balances the fraction of rejected predictions (i.e., the proportion of examples for which the model does not make a prediction) versus the improvement in predictive performance on the selected predictions. Multiple selective classification frameworks exist, most of which rely on deep neural network architectures. However, the empirical evaluation of the existing approaches is still limited to partial comparisons among methods and settings, providing practitioners with little insight into their relative merits. We fill this gap by benchmarking 18 baselines on a diverse set of 44 datasets that includes both image and tabular data. Moreover, there is a mix of binary and multiclass tasks. We evaluate these approaches using several criteria, including selective error rate, empirical coverage, distribution of rejected instance's classes, and performance on out-of-distribution instances. The results indicate that there is not a single clear winner among the surveyed baselines, and the best method depends on the users' objectives.
- [1344] arXiv:2401.12714 (cross-list from cs.SE) [ pdf , other ]
-
Title: Evaluation of large language models for assessing code maintainabilityComments: 14 pages, 4 figures, 8 tablesSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Increased availability of open-source software repositories and recent advances in code analysis using large language models (LLMs) has triggered a wave of new work to automate software engineering tasks that were previously very difficult to automate. In this paper, we investigate a recent line of work that hypothesises that comparing the probability of code generated by LLMs with the probability the current code would have had can indicate potential quality problems. We investigate the association between the cross-entropy of code generated by ten different models (based on GPT2 and Llama2) and the following quality aspects: readability, understandability, complexity, modularisation, and overall maintainability assessed by experts and available in an benchmark dataset. Our results show that, controlling for the number of logical lines of codes (LLOC), cross-entropy computed by LLMs is indeed a predictor of maintainability on a class level (the higher the cross-entropy the lower the maintainability). However, this relation is reversed when one does not control for LLOC (e.g., comparing small classes with longer ones). Furthermore, while the complexity of LLMs affects the range of cross-entropy (smaller models tend to have a wider range of cross-entropy), this plays a significant role in predicting maintainability aspects. Our study limits itself on ten different pretrained models (based on GPT2 and Llama2) and on maintainability aspects collected by Schnappinger et al. When controlling for logical lines of code (LLOC), cross-entropy is a predictor of maintainability. However, while related work has shown the potential usefulness of cross-entropy at the level of tokens or short sequences, at the class level this criterion alone may prove insufficient to predict maintainability and further research is needed to make best use of this information in practice.
- [1345] arXiv:2401.12756 (cross-list from cs.CL) [ pdf , other ]
-
Title: What the Weight?! A Unified Framework for Zero-Shot Knowledge CompositionComments: Accepted to Findings of the ACL: EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The knowledge encapsulated in a model is the core factor determining its final performance on downstream tasks. Much research in NLP has focused on efficient methods for storing and adapting different types of knowledge, e.g., in dedicated modularized structures, and on how to effectively combine these, e.g., by learning additional parameters. However, given the many possible options, a thorough understanding of the mechanisms involved in these compositions is missing, and hence it remains unclear which strategies to utilize. To address this research gap, we propose a novel framework for zero-shot module composition, which encompasses existing and some novel variations for selecting, weighting, and combining parameter modules under a single unified notion. Focusing on the scenario of domain knowledge and adapter layers, our framework provides a systematic unification of concepts, allowing us to conduct the first comprehensive benchmarking study of various zero-shot knowledge composition strategies. In particular, we test two module combination methods and five selection and weighting strategies for their effectiveness and efficiency in an extensive experimental setup. Our results highlight the efficacy of ensembling but also hint at the power of simple though often-ignored weighting methods. Further in-depth analyses allow us to understand the role of weighting vs. top-k selection, and show that, to a certain extent, the performance of adapter composition can even be predicted.
- [1346] arXiv:2401.12803 (cross-list from cs.IT) [ pdf , other ]
-
Title: Enhancements for 5G NR PRACH Reception: An AI/ML ApproachSubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP)
Abstract: Random Access is an important step in enabling the initial attachment of a User Equipment (UE) to a Base Station (gNB). The UE identifies itself by embedding a Preamble Index (RAPID) in the phase rotation of a known base sequence, which it transmits on the Physical Random Access Channel (PRACH). The signal on the PRACH also enables the estimation of propagation delay, often known as Timing Advance (TA), which is induced by virtue of the UE's position. Traditional receivers estimate the RAPID and TA using correlation-based techniques. This paper presents an alternative receiver approach that uses AI/ML models, wherein two neural networks are proposed, one for the RAPID and one for the TA. Different from other works, these two models can run in parallel as opposed to sequentially. Experiments with both simulated data and over-the-air hardware captures highlight the improved performance of the proposed AI/ML-based techniques compared to conventional correlation methods.
- [1347] arXiv:2401.12806 (cross-list from cs.LG) [ pdf , other ]
-
Title: Binary structured physics-informed neural networks for solving equations with rapidly changing solutionsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Physics-informed neural networks (PINNs), rooted in deep learning, have emerged as a promising approach for solving partial differential equations (PDEs). By embedding the physical information described by PDEs into feedforward neural networks, PINNs are trained as surrogate models to approximate solutions without the need for label data. Nevertheless, even though PINNs have shown remarkable performance, they can face difficulties, especially when dealing with equations featuring rapidly changing solutions. These difficulties encompass slow convergence, susceptibility to becoming trapped in local minima, and reduced solution accuracy. To address these issues, we propose a binary structured physics-informed neural network (BsPINN) framework, which employs binary structured neural network (BsNN) as the neural network component. By leveraging a binary structure that reduces inter-neuron connections compared to fully connected neural networks, BsPINNs excel in capturing the local features of solutions more effectively and efficiently. These features are particularly crucial for learning the rapidly changing in the nature of solutions. In a series of numerical experiments solving Burgers equation, Euler equation, Helmholtz equation, and high-dimension Poisson equation, BsPINNs exhibit superior convergence speed and heightened accuracy compared to PINNs. From these experiments, we discover that BsPINNs resolve the issues caused by increased hidden layers in PINNs resulting in over-smoothing, and prevent the decline in accuracy due to non-smoothness of PDEs solutions.
- [1348] arXiv:2401.12819 (cross-list from cs.LG) [ pdf , other ]
-
Title: Dynamic Layer Tying for Parameter-Efficient TransformersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In the pursuit of reducing the number of trainable parameters in deep transformer networks, we employ Reinforcement Learning to dynamically select layers during training and tie them together. Every few iterations, the RL agent is asked whether to train each layer $i$ independently or to copy the weights of a previous layer $j<i$. This facilitates weight sharing, reduces the number of trainable parameters, and also serves as an effective regularization technique. Experimental evaluations validate that our model modestly outperforms the baseline transformer model with regard to perplexity and drastically reduces the number of trainable parameters. In particular, the memory consumption during training is up to one order of magnitude less than the conventional training method.
- [1349] arXiv:2401.12822 (cross-list from eess.SY) [ pdf , other ]
-
Title: Deep Learning Based Simulators for the Phosphorus Removal Process Control in Wastewater Treatment via Deep Reinforcement Learning AlgorithmsAuthors: Esmaeel Mohammadi , Mikkel Stokholm-Bjerregaard , Aviaja Anna Hansen , Per Halkjær Nielsen , Daniel Ortiz-Arroyo , Petar DurdevicComments: Journal PaperJournal-ref: Engineering Applications of Artificial Intelligence 133 (2024) 107992Subjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Phosphorus removal is vital in wastewater treatment to reduce reliance on limited resources. Deep reinforcement learning (DRL) is a machine learning technique that can optimize complex and nonlinear systems, including the processes in wastewater treatment plants, by learning control policies through trial and error. However, applying DRL to chemical and biological processes is challenging due to the need for accurate simulators. This study trained six models to identify the phosphorus removal process and used them to create a simulator for the DRL environment. Although the models achieved high accuracy (>97%), uncertainty and incorrect prediction behavior limited their performance as simulators over longer horizons. Compounding errors in the models' predictions were identified as one of the causes of this problem. This approach for improving process control involves creating simulation environments for DRL algorithms, using data from supervisory control and data acquisition (SCADA) systems with a sufficient historical horizon without complex system modeling or parameter estimation.
- [1350] arXiv:2401.12830 (cross-list from cs.LG) [ pdf , other ]
-
Title: Enhancing Next Destination Prediction: A Novel LSTM Approach Using Real-World Airline DataSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In the modern transportation industry, accurate prediction of travelers' next destinations brings multiple benefits to companies, such as customer satisfaction and targeted marketing. This study focuses on developing a precise model that captures the sequential patterns and dependencies in travel data, enabling accurate predictions of individual travelers' future destinations. To achieve this, a novel model architecture with a sliding window approach based on Long Short-Term Memory (LSTM) is proposed for destination prediction in the transportation industry. The experimental results highlight satisfactory performance and high scores achieved by the proposed model across different data sizes and performance metrics. This research contributes to advancing destination prediction methods, empowering companies to deliver personalized recommendations and optimize customer experiences in the dynamic travel landscape.
- [1351] arXiv:2401.12835 (cross-list from cs.CV) [ pdf , other ]
-
Title: SGTR+: End-to-end Scene Graph Generation with TransformerComments: Accepted by TPAMI: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property. Most previous works adopt a bottom-up, two-stage or point-based, one-stage approach, which often suffers from high time complexity or suboptimal designs. In this work, we propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem. To address the issues above, we create a transformer-based end-to-end framework to generate the entity and entity-aware predicate proposal set, and infer directed edges to form relation triplets. Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner. Based on bipartite graph assembling paradigm, we further propose a new technical design to address the efficacy of entity-aware modeling and optimization stability of graph assembling. Equipped with the enhanced entity-aware design, our method achieves optimal performance and time-complexity. Extensive experimental results show that our design is able to achieve the state-of-the-art or comparable performance on three challenging benchmarks, surpassing most of the existing approaches and enjoying higher efficiency in inference. Code is available: this https URL
- [1352] arXiv:2401.12851 (cross-list from cs.CV) [ pdf , other ]
-
Title: Classification of grapevine varieties using UAV hyperspectral imagingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The classification of different grapevine varieties is a relevant phenotyping task in Precision Viticulture since it enables estimating the growth of vineyard rows dedicated to different varieties, among other applications concerning the wine industry. This task can be performed with destructive methods that require time-consuming tasks, including data collection and analysis in the laboratory. However, Unmanned Aerial Vehicles (UAV) provide a more efficient and less prohibitive approach to collecting hyperspectral data, despite acquiring noisier data. Therefore, the first task is the processing of these data to correct and downsample large amounts of data. In addition, the hyperspectral signatures of grape varieties are very similar. In this work, a Convolutional Neural Network (CNN) is proposed for classifying seventeen varieties of red and white grape variants. Rather than classifying single samples, these are processed together with their neighbourhood. Hence, the extraction of spatial and spectral features is addressed with 1) a spatial attention layer and 2) Inception blocks. The pipeline goes from processing to dataset elaboration, finishing with the training phase. The fitted model is evaluated in terms of response time, accuracy and data separability, and compared with other state-of-the-art CNNs for classifying hyperspectral data. Our network was proven to be much more lightweight with a reduced number of input bands, a lower number of trainable weights and therefore, reduced training time. Despite this, the evaluated metrics showed much better results for our network (~99% overall accuracy), in comparison with previous works barely achieving 81% OA.
- [1353] arXiv:2401.12862 (cross-list from cs.CV) [ pdf , other ]
-
Title: FedRSU: Federated Learning for Scene Flow Estimation on Roadside UnitsAuthors: Shaoheng Fang , Rui Ye , Wenhao Wang , Zuhong Liu , Yuxiao Wang , Yafei Wang , Siheng Chen , Yanfeng WangSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Roadside unit (RSU) can significantly improve the safety and robustness of autonomous vehicles through Vehicle-to-Everything (V2X) communication. Currently, the usage of a single RSU mainly focuses on real-time inference and V2X collaboration, while neglecting the potential value of the high-quality data collected by RSU sensors. Integrating the vast amounts of data from numerous RSUs can provide a rich source of data for model training. However, the absence of ground truth annotations and the difficulty of transmitting enormous volumes of data are two inevitable barriers to fully exploiting this hidden value. In this paper, we introduce FedRSU, an innovative federated learning framework for self-supervised scene flow estimation. In FedRSU, we present a recurrent self-supervision training paradigm, where for each RSU, the scene flow prediction of points at every timestamp can be supervised by its subsequent future multi-modality observation. Another key component of FedRSU is federated learning, where multiple devices collaboratively train an ML model while keeping the training data local and private. With the power of the recurrent self-supervised learning paradigm, FL is able to leverage innumerable underutilized data from RSU. To verify the FedRSU framework, we construct a large-scale multi-modality dataset RSU-SF. The dataset consists of 17 RSU clients, covering various scenarios, modalities, and sensor settings. Based on RSU-SF, we show that FedRSU can greatly improve model performance in ITS and provide a comprehensive benchmark under diverse FL scenarios. To the best of our knowledge, we provide the first real-world LiDAR-camera multi-modal dataset and benchmark for the FL community.
- [1354] arXiv:2401.12863 (cross-list from cs.CL) [ pdf , other ]
-
Title: KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts ReasoningComments: AAAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates CoT reasoning, Knowledge Graphs (KGs), and multiple modalities for a comprehensive understanding of multimodal tasks. KAM-CoT adopts a two-stage training process with KG grounding to generate effective rationales and answers. By incorporating external knowledge from KGs during reasoning, the model gains a deeper contextual understanding reducing hallucinations and enhancing the quality of answers. This knowledge-augmented CoT reasoning empowers the model to handle questions requiring external context, providing more informed answers. Experimental findings show KAM-CoT outperforms the state-of-the-art methods. On the ScienceQA dataset, we achieve an average accuracy of 93.87%, surpassing GPT-3.5 (75.17%) by 18% and GPT-4 (83.99%) by 10%. Remarkably, KAM-CoT achieves these results with only 280M trainable parameters at a time, demonstrating its cost-efficiency and effectiveness.
- [1355] arXiv:2401.12873 (cross-list from cs.CL) [ pdf , other ]
-
Title: Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward ModelAuthors: Zhiwei He , Xing Wang , Wenxiang Jiao , Zhuosheng Zhang , Rui Wang , Shuming Shi , Zhaopeng TuComments: NAACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Insufficient modeling of human preferences within the reward model is a major obstacle for leveraging human feedback to improve translation quality. Fortunately, quality estimation (QE), which predicts the quality of a given translation without reference, has achieved impressive alignment with human evaluations in the last two years. In this work, we investigate the potential of employing the QE model as the reward model to predict human preferences for feedback training. We first identify the overoptimization problem during QE-based feedback training, manifested as an increase in reward while translation quality declines. We examine the problem and argue that the vulnerability of the QE model might lead to high rewards for incorrect translations, resulting in overoptimization and error propagation. To address the problem, we adopt a simple yet effective method that uses heuristic rules to detect the incorrect translations and assigns a penalty term to the reward scores of them. Experimental results show that the proposed QE-based feedback training achieves consistent and significant improvements across various settings, further verified through human preference studies. Our subsequent analysis demonstrates the high data efficiency of the proposed QE-based feedback training: it outperforms systems using larger parallel corpora by a small amount of monolingual data. Our code is available at: this https URL
- [1356] arXiv:2401.12874 (cross-list from cs.CL) [ pdf , other ]
-
Title: From Understanding to Utilization: A Survey on Explainability for Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Explainability for Large Language Models (LLMs) is a critical yet challenging aspect of natural language processing. As LLMs are increasingly integral to diverse applications, their "black-box" nature sparks significant concerns regarding transparency and ethical use. This survey underscores the imperative for increased explainability in LLMs, delving into both the research on explainability and the various methodologies and tasks that utilize an understanding of these models. Our focus is primarily on pre-trained Transformer-based LLMs, such as LLaMA family, which pose distinctive interpretability challenges due to their scale and complexity. In terms of existing methods, we classify them into local and global analyses, based on their explanatory objectives. When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement. Additionally, we examine representative evaluation metrics and datasets, elucidating their advantages and limitations. Our goal is to reconcile theoretical and empirical understanding with practical implementation, proposing exciting avenues for explanatory techniques and their applications in the LLMs era.
- [1357] arXiv:2401.12914 (cross-list from cs.IT) [ pdf , other ]
-
Title: Emergent Communication Protocol Learning for Task Offloading in Industrial Internet of ThingsJournal-ref: GLOBECOM 2023Subjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: In this paper, we leverage a multi-agent reinforcement learning (MARL) framework to jointly learn a computation offloading decision and multichannel access policy with corresponding signaling. Specifically, the base station and industrial Internet of Things mobile devices are reinforcement learning agents that need to cooperate to execute their computation tasks within a deadline constraint. We adopt an emergent communication protocol learning framework to solve this problem. The numerical results illustrate the effectiveness of emergent communication in improving the channel access success rate and the number of successfully computed tasks compared to contention-based, contention-free, and no-communication approaches. Moreover, the proposed task offloading policy outperforms remote and local computation baselines.
- [1358] arXiv:2401.12947 (cross-list from cs.CL) [ pdf , other ]
-
Title: Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural RecursionComments: arXiv admin note: text overlap with arXiv:2305.14699Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Abstract: This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic relations between datatypes and emulating program behavior. We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned models' behavior. The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics} -- one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture.
With our framework as a powerful conceptual tool, we identify different issues under various set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art large language models (LLMs) to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise computation) of the recursive function. - [1359] arXiv:2401.12954 (cross-list from cs.CL) [ pdf , other ]
-
Title: Meta-Prompting: Enhancing Language Models with Task-Agnostic ScaffoldingComments: this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: We introduce meta-prompting, an effective scaffolding technique designed to enhance the functionality of language models (LMs). This approach transforms a single LM into a multi-faceted conductor, adept at managing and integrating multiple independent LM queries. By employing high-level instructions, meta-prompting guides the LM to break down complex tasks into smaller, more manageable subtasks. These subtasks are then handled by distinct "expert" instances of the same LM, each operating under specific, tailored instructions. Central to this process is the LM itself, in its role as the conductor, which ensures seamless communication and effective integration of the outputs from these expert models. It additionally employs its inherent critical thinking and robust verification processes to refine and authenticate the end result. This collaborative prompting approach empowers a single LM to simultaneously act as a comprehensive orchestrator and a panel of diverse experts, significantly enhancing its performance across a wide array of tasks. The zero-shot, task-agnostic nature of meta-prompting greatly simplifies user interaction by obviating the need for detailed, task-specific instructions. Furthermore, our research demonstrates the seamless integration of external tools, such as a Python interpreter, into the meta-prompting framework, thereby broadening its applicability and utility. Through rigorous experimentation with GPT-4, we establish the superiority of meta-prompting over conventional scaffolding methods: When averaged across all tasks, including the Game of 24, Checkmate-in-One, and Python Programming Puzzles, meta-prompting, augmented with a Python interpreter functionality, surpasses standard prompting by 17.1%, expert (dynamic) prompting by 17.3%, and multipersona prompting by 15.2%.
- [1360] arXiv:2401.12963 (cross-list from cs.RO) [ pdf , other ]
-
Title: AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic AgentsAuthors: Michael Ahn , Debidatta Dwibedi , Chelsea Finn , Montse Gonzalez Arenas , Keerthana Gopalakrishnan , Karol Hausman , Brian Ichter , Alex Irpan , Nikhil Joshi , Ryan Julian , Sean Kirmani , Isabel Leal , Edward Lee , Sergey Levine , Yao Lu , Isabel Leal , Sharath Maddineni , Kanishka Rao , Dorsa Sadigh , Pannag Sanketi , Pierre Sermanet , Quan Vuong , Stefan Welker , Fei Xia , Ted Xiao , Peng Xu , Steve Xu , Zhuo XuComments: 26 pages, 9 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is the lack of data grounded in the physical world. In this paper, we propose AutoRT, a system that leverages existing foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. AutoRT leverages vision-language models (VLMs) for scene understanding and grounding, and further uses large language models (LLMs) for proposing diverse and novel instructions to be performed by a fleet of robots. Guiding data collection by tapping into the knowledge of foundation models enables AutoRT to effectively reason about autonomy tradeoffs and safety while significantly scaling up data collection for robot learning. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
- [1361] arXiv:2401.12972 (cross-list from cs.CV) [ pdf , other ]
-
Title: On the Efficacy of Text-Based Input Modalities for Action AnticipationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Abstract: Although the task of anticipating future actions is highly uncertain, information from additional modalities help to narrow down plausible action choices. Each modality provides different environmental context for the model to learn from. While previous multi-modal methods leverage information from modalities such as video and audio, we primarily explore how text inputs for actions and objects can also enable more accurate action anticipation. Therefore, we propose a Multi-modal Anticipative Transformer (MAT), an attention-based video transformer architecture that jointly learns from multi-modal features and text captions. We train our model in two-stages, where the model first learns to predict actions in the video clip by aligning with captions, and during the second stage, we fine-tune the model to predict future actions. Compared to existing methods, MAT has the advantage of learning additional environmental context from two kinds of text inputs: action descriptions during the pre-training stage, and the text inputs for detected objects and actions during modality feature fusion. Through extensive experiments, we evaluate the effectiveness of the pre-training stage, and show that our model outperforms previous methods on all datasets. In addition, we examine the impact of object and action information obtained via text and perform extensive ablations. We evaluate the performance on on three datasets: EpicKitchens-100, EpicKitchens-55 and EGTEA GAZE+; and show that text descriptions do indeed aid in more effective action anticipation.
- [1362] arXiv:2401.12975 (cross-list from cs.CV) [ pdf , other ]
-
Title: HAZARD Challenge: Embodied Decision Making in Dynamically Changing EnvironmentsAuthors: Qinhong Zhou , Sunli Chen , Yisong Wang , Haozhe Xu , Weihua Du , Hongxin Zhang , Yilun Du , Joshua B. Tenenbaum , Chuang GanComments: ICLR 2024. The first two authors contributed equally to this workSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events and need to rapidly take action accordingly. To remedy this gap, we propose a new simulated embodied benchmark, called HAZARD, specifically designed to assess the decision-making abilities of embodied agents in dynamic situations. HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind, and specifically supports the utilization of large language models (LLMs) to assist common sense reasoning and decision-making. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines, including reinforcement learning (RL), rule-based, and search-based methods in dynamically changing environments. As a first step toward addressing this challenge using large language models, we further develop an LLM-based agent and perform an in-depth analysis of its promise and challenge of solving these challenging tasks. HAZARD is available at this https URL .
- [1363] arXiv:2401.12983 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Assessing Large Language Models in Mechanical Engineering Education: A Study on Mechanics-Focused Conceptual UnderstandingAuthors: Jie Tian , Jixin Hou , Zihao Wu , Peng Shu , Zhengliang Liu , Yujie Xiang , Beikang Gu , Nicholas Filla , Yiwei Li , Ning Liu , Xianyan Chen , Keke Tang , Tianming Liu , Xianqiao WangComments: 30 pages, 7 figures, and 1 tableSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Physics Education (physics.ed-ph)
Abstract: This study is a pioneering endeavor to investigate the capabilities of Large Language Models (LLMs) in addressing conceptual questions within the domain of mechanical engineering with a focus on mechanics. Our examination involves a manually crafted exam encompassing 126 multiple-choice questions, spanning various aspects of mechanics courses, including Fluid Mechanics, Mechanical Vibration, Engineering Statics and Dynamics, Mechanics of Materials, Theory of Elasticity, and Continuum Mechanics. Three LLMs, including ChatGPT (GPT-3.5), ChatGPT (GPT-4), and Claude (Claude-2.1), were subjected to evaluation against engineering faculties and students with or without mechanical engineering background. The findings reveal GPT-4's superior performance over the other two LLMs and human cohorts in answering questions across various mechanics topics, except for Continuum Mechanics. This signals the potential future improvements for GPT models in handling symbolic calculations and tensor analyses. The performances of LLMs were all significantly improved with explanations prompted prior to direct responses, underscoring the crucial role of prompt engineering. Interestingly, GPT-3.5 demonstrates improved performance with prompts covering a broader domain, while GPT-4 excels with prompts focusing on specific subjects. Finally, GPT-4 exhibits notable advancements in mitigating input bias, as evidenced by guessing preferences for humans. This study unveils the substantial potential of LLMs as highly knowledgeable assistants in both mechanical pedagogy and scientific research.
- [1364] arXiv:2401.12986 (cross-list from cs.CL) [ pdf , other ]
-
Title: Crowdsourced Adaptive SurveysAuthors: Yamil VelezComments: 25 pages, 5 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Applications (stat.AP)
Abstract: Public opinion surveys are vital for informing democratic decision-making, but responding to rapidly changing information environments and measuring beliefs within niche communities can be challenging for traditional survey methods. This paper introduces a crowdsourced adaptive survey methodology (CSAS) that unites advances in natural language processing and adaptive algorithms to generate question banks that evolve with user input. The CSAS method converts open-ended text provided by participants into Likert-style items and applies a multi-armed bandit algorithm to determine user-provided questions that should be prioritized in the survey. The method's adaptive nature allows for the exploration of new survey questions, while imposing minimal costs in survey length. Applications in the domains of Latino information environments and issue importance showcase CSAS's ability to identify claims or issues that might otherwise be difficult to track using standard approaches. I conclude by discussing the method's potential for studying topics where participant-generated content might improve our understanding of public opinion.
- [1365] arXiv:2401.12988 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Few-Shot Learning for Chronic Disease Management: Leveraging Large Language Models and Multi-Prompt Engineering with Medical Knowledge InjectionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study harnesses state-of-the-art AI technology for chronic disease management, specifically in detecting various mental disorders through user-generated textual content. Existing studies typically rely on fully supervised machine learning, which presents challenges such as the labor-intensive manual process of annotating extensive training data for each disease and the need to design specialized deep learning architectures for each problem. To address such challenges, we propose a novel framework that leverages advanced AI techniques, including large language models and multi-prompt engineering. Specifically, we address two key technical challenges in data-driven chronic disease management: (1) developing personalized prompts to represent each user's uniqueness and (2) incorporating medical knowledge into prompts to provide context for chronic disease detection, instruct learning objectives, and operationalize prediction goals. We evaluate our method using four mental disorders, which are prevalent chronic diseases worldwide, as research cases. On the depression detection task, our method (F1 = 0.975~0.978) significantly outperforms traditional supervised learning paradigms, including feature engineering (F1 = 0.760) and architecture engineering (F1 = 0.756). Meanwhile, our approach demonstrates success in few-shot learning, i.e., requiring only a minimal number of training examples to detect chronic diseases based on user-generated textual content (i.e., only 2, 10, or 100 subjects). Moreover, our method can be generalized to other mental disorder detection tasks, including anorexia, pathological gambling, and self-harm (F1 = 0.919~0.978).
- [1366] arXiv:2401.12996 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: A Comparison of Veterans with Problematic Opioid Use Identified through Natural Language Processing of Clinical Notes versus Using Diagnostic CodesAuthors: Terri Elizabeth Workman , Joel Kupersmith , Phillip Ma , Christopher Spevak , Friedhelm Sandbrink , Yan Cheng Qing Zeng-TreitlerComments: 17 pages, 4 figures, 8 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Background: Electronic health records (EHRs) are a data source for opioid research. Opioid use disorder is known to be under-coded as a diagnosis, yet problematic opioid use can be documented in clinical notes.
Objectives: Our goals were 1) to identify problematic opioid use from a full range of clinical notes; and 2) to compare the characteristics of patients identified as having problematic opioid use, exclusively documented in clinical notes, to those having documented ICD opioid use disorder diagnostic codes.
Materials and Methods: We developed and applied a natural language processing (NLP) tool to the clinical notes of a patient cohort (n=222,371) from two Veteran Affairs service regions to identify patients with problematic opioid use. We also used a set of ICD diagnostic codes to identify patients with opioid use disorder from the same cohort. We compared the demographic and clinical characteristics of patients identified only through NLP, to those of patients identified through ICD codes.
Results: NLP exclusively identified 57,331 patients; 6,997 patients had positive ICD code identifications. Patients exclusively identified through NLP were more likely to be women. Those identified through ICD codes were more likely to be male, younger, have concurrent benzodiazepine prescriptions, more comorbidities, more care encounters, and less likely to be married. Patients in the NLP and ICD groups had substantially elevated comorbidity levels compared to patients not documented as experiencing problematic opioid use.
Conclusions: NLP is a feasible approach for identifying problematic opioid use not otherwise recorded by ICD codes. Clinicians may be reluctant to code for opioid use disorder. It is therefore incumbent on the healthcare team to search for documentation of opioid concerns within clinical notes. - [1367] arXiv:2401.12998 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Evaluating and Enhancing Large Language Models Performance in Domain-specific Medicine: Osteoarthritis Management with DocOAAuthors: Xi Chen , MingKe You , Li Wang , WeiZhi Liu , Yu Fu , Jie Xu , Shaoting Zhang , Gang Chen , Kang Li , Jian LiComments: 16 Pages, 7 FiguresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. This study focused on evaluating and enhancing the clinical capabilities of LLMs in specific domains, using osteoarthritis (OA) management as a case study. A domain specific benchmark framework was developed, which evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM tailored for OA management that integrates retrieval-augmented generation (RAG) and instruction prompts, was developed. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results showed that general LLMs like GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. This study introduces a novel benchmark framework which assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs.
- [1368] arXiv:2401.13001 (cross-list from cs.GR) [ pdf , other ]
-
Title: PatternPortrait: Draw Me Like One of Your ScribblesSubjects: Graphics (cs.GR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper introduces a process for generating abstract portrait drawings from pictures. Their unique style is created by utilizing single freehand pattern sketches as references to generate unique patterns for shading. The method involves extracting facial and body features from images and transforming them into vector lines. A key aspect of the research is the development of a graph neural network architecture designed to learn sketch stroke representations in vector form, enabling the generation of diverse stroke variations. The combination of these two approaches creates joyful abstract drawings that are realized via a pen plotter. The presented process garnered positive feedback from an audience of approximately 280 participants.
- [1369] arXiv:2401.13002 (cross-list from cs.CG) [ pdf , other ]
-
Title: Theorem Discovery Amongst Cyclic PolygonsAuthors: Philip Todd (Saltire Software)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 153-164Subjects: Computational Geometry (cs.CG) ; Artificial Intelligence (cs.AI)
Abstract: We examine a class of geometric theorems on cyclic 2n-gons. We prove that if we take n disjoint pairs of sides, each pair separated by an even number of polygon sides, then there is a linear combination of the angles between those sides which is constant. We present a formula for the linear combination, which provides a theorem statement in terms of those angles. We describe a program which uses this result to generate new geometry proof problems and their solutions.
- [1370] arXiv:2401.13034 (cross-list from cs.LG) [ pdf , other ]
-
Title: Locality Sensitive Sparse Encoding for Learning World Models OnlineComments: ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.
- [1371] arXiv:2401.13060 (cross-list from cs.CL) [ pdf , other ]
-
Title: TCE at Qur'an QA 2023 Shared Task: Low Resource Enhanced Transformer-based Ensemble Approach for Qur'anic QASubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we present our approach to tackle Qur'an QA 2023 shared tasks A and B. To address the challenge of low-resourced training data, we rely on transfer learning together with a voting ensemble to improve prediction stability across multiple runs. Additionally, we employ different architectures and learning mechanisms for a range of Arabic pre-trained transformer-based models for both tasks. To identify unanswerable questions, we propose using a thresholding mechanism. Our top-performing systems greatly surpass the baseline performance on the hidden split, achieving a MAP score of 25.05% for task A and a partial Average Precision (pAP) of 57.11% for task B.
- [1372] arXiv:2401.13068 (cross-list from cs.CV) [ pdf , other ]
-
Title: Local Background Estimation for Improved Gas Plume Identification in Hyperspectral ImagesComments: Submitted to International Geoscience and Remote Sensing Symposium (IGARSS), 2024. 5 pages, 2 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Deep learning identification models have shown promise for identifying gas plumes in Longwave IR hyperspectral images of urban scenes, particularly when a large library of gases are being considered. Because many gases have similar spectral signatures, it is important to properly estimate the signal from a detected plume. Typically, a scene's global mean spectrum and covariance matrix are estimated to whiten the plume's signal, which removes the background's signature from the gas signature. However, urban scenes can have many different background materials that are spatially and spectrally heterogeneous. This can lead to poor identification performance when the global background estimate is not representative of a given local background material. We use image segmentation, along with an iterative background estimation algorithm, to create local estimates for the various background materials that reside underneath a gas plume. Our method outperforms global background estimation on a set of simulated and real gas plumes. This method shows promise in increasing deep learning identification confidence, while being simple and easy to tune when considering diverse plumes.
- [1373] arXiv:2401.13081 (cross-list from cs.CV) [ pdf , other ]
-
Title: Free Form Medical Visual Question Answering in RadiologyAuthors: Abhishek Narayanan , Rushabh Musthyala , Rahul Sankar , Anirudh Prasad Nistala , Pranav Singh , Jacopo CirroneComments: 6 pages and 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Visual Question Answering (VQA) in the medical domain presents a unique, interdisciplinary challenge, combining fields such as Computer Vision, Natural Language Processing, and Knowledge Representation. Despite its importance, research in medical VQA has been scant, only gaining momentum since 2018. Addressing this gap, our research delves into the effective representation of radiology images and the joint learning of multimodal representations, surpassing existing methods. We innovatively augment the SLAKE dataset, enabling our model to respond to a more diverse array of questions, not limited to the immediate content of radiology or pathology images. Our model achieves a top-1 accuracy of 79.55\% with a less complex architecture, demonstrating comparable performance to current state-of-the-art models. This research not only advances medical VQA but also opens avenues for practical applications in diagnostic settings.
- [1374] arXiv:2401.13085 (cross-list from cs.CL) [ pdf , other ]
-
Title: IndiText Boost: Text Augmentation for Low Resource India LanguagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Text Augmentation is an important task for low-resource languages. It helps deal with the problem of data scarcity. A data augmentation strategy is used to deal with the problem of data scarcity. Through the years, much work has been done on data augmentation for the English language. In contrast, very less work has been done on Indian languages. This is contrary to the fact that data augmentation is used to deal with data scarcity. In this work, we focus on implementing techniques like Easy Data Augmentation, Back Translation, Paraphrasing, Text Generation using LLMs, and Text Expansion using LLMs for text classification on different languages. We focus on 6 Indian languages namely: Sindhi, Marathi, Hindi, Gujarati, Telugu, and Sanskrit. According to our knowledge, no such work exists for text augmentation on Indian languages. We carry out binary as well as multi-class text classification to make our results more comparable. We get surprising results as basic data augmentation techniques surpass LLMs.
- [1375] arXiv:2401.13086 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Towards Trustable Language Models: Investigating Information Quality of Large Language ModelsComments: 31 pagesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLM) are generating information at a rapid pace, requiring users to increasingly rely and trust the data. Despite remarkable advances of LLM, Information generated by LLM is not completely trustworthy, due to challenges in information quality. Specifically, integrity of Information quality decreases due to unreliable, biased, tokenization during pre-training of LLM. Moreover, due to decreased information quality issues, has led towards hallucination, fabricated information. Unreliable information can lead towards flawed decisions in businesses, which impacts economic activity. In this work, we introduce novel mathematical information quality evaluation of LLM, we furthermore analyze and highlight information quality challenges, scaling laws to systematically scale language models.
- [1376] arXiv:2401.13097 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning SystemsComments: 20 pages, 3 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Computer-based scene understanding has influenced fields ranging from urban planning to autonomous vehicle performance, yet little is known about how well these technologies work across social differences. We investigate the biases of deep convolutional neural networks (dCNNs) in scene classification, using nearly one million images from global and US sources, including user-submitted home photographs and Airbnb listings. We applied statistical models to quantify the impact of socioeconomic indicators such as family income, Human Development Index (HDI), and demographic factors from public data sources (CIA and US Census) on dCNN performance. Our analyses revealed significant socioeconomic bias, where pretrained dCNNs demonstrated lower classification accuracy, lower classification confidence, and a higher tendency to assign labels that could be offensive when applied to homes (e.g., "ruin", "slum"), especially in images from homes with lower socioeconomic status (SES). This trend is consistent across two datasets of international images and within the diverse economic and racial landscapes of the United States. This research contributes to understanding biases in computer vision, emphasizing the need for more inclusive and representative training datasets. By mitigating the bias in the computer vision pipelines, we can ensure fairer and more equitable outcomes for applied computer vision, including home valuation and smart home security systems. There is urgency in addressing these biases, which can significantly impact critical decisions in urban development and resource allocation. Our findings also motivate the development of AI systems that better understand and serve diverse communities, moving towards technology that equitably benefits all sectors of society.
- [1377] arXiv:2401.13098 (cross-list from cs.LG) [ pdf , other ]
-
Title: Gravity-Informed Deep Learning Framework for Predicting Ship Traffic Flow and Invasion Risk of Non-Indigenous Species via Ballast Water DischargeComments: 26 pages, 7 figures, under reviewSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Applications (stat.AP)
Abstract: Invasive species in water bodies pose a major threat to the environment and biodiversity globally. Due to increased transportation and trade, non-native species have been introduced to new environments, causing damage to ecosystems and leading to economic losses in agriculture, forestry, and fisheries. Therefore, there is a pressing need for risk assessment and management techniques to mitigate the impact of these invasions. This study aims to develop a new physics-inspired model to forecast maritime shipping traffic and thus inform risk assessment of invasive species spread through global transportation networks. Inspired by the gravity model for international trades, our model considers various factors that influence the likelihood and impact of vessel activities, such as shipping flux density, distance between ports, trade flow, and centrality measures of transportation hubs. Additionally, by analyzing the risk network of invasive species, we provide a comprehensive framework for assessing the invasion threat level given a pair of origin and destination. Accordingly, this paper introduces transformers to gravity models to rebuild the short- and long-term dependencies that make the risk analysis feasible. Thus, we introduce a physics-inspired framework that achieves an 89% segmentation accuracy for existing and non-existing trajectories and an 84.8% accuracy for the number of vessels flowing between key port areas, representing more than 10% improvement over the traditional deep-gravity model. Along these lines, this research contributes to a better understanding of invasive species risk assessment. It allows policymakers, conservationists, and stakeholders to prioritize management actions by identifying high-risk invasion pathways. Besides, our model is versatile and can include new data sources, making it suitable for assessing species invasion risks in a changing global landscape.
- [1378] arXiv:2401.13099 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Sparse identification of nonlinear dynamics in the presence of library and system uncertaintyAuthors: Andrew O'BrienSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The SINDy algorithm has been successfully used to identify the governing equations of dynamical systems from time series data. However, SINDy assumes the user has prior knowledge of the variables in the system and of a function library that can act as a basis for the system. In this paper, we demonstrate on real world data how the Augmented SINDy algorithm outperforms SINDy in the presence of system variable uncertainty. We then show SINDy can be further augmented to perform robustly when both kinds of uncertainty are present.
- [1379] arXiv:2401.13136 (cross-list from cs.CL) [ pdf , other ]
-
Title: The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual ContextsAuthors: Lingfeng Shen , Weiting Tan , Sihao Chen , Yunmo Chen , Jingyu Zhang , Haoran Xu , Boyuan Zheng , Philipp Koehn , Daniel KhashabiSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: As the influence of large language models (LLMs) spans across global communities, their safety challenges in multilingual settings become paramount for alignment research. This paper examines the variations in safety challenges faced by LLMs across different languages and discusses approaches to alleviating such concerns. By comparing how state-of-the-art LLMs respond to the same set of malicious prompts written in higher- vs. lower-resource languages, we observe that (1) LLMs tend to generate unsafe responses much more often when a malicious prompt is written in a lower-resource language, and (2) LLMs tend to generate more irrelevant responses to malicious prompts in lower-resource languages. To understand where the discrepancy can be attributed, we study the effect of instruction tuning with reinforcement learning from human feedback (RLHF) or supervised finetuning (SFT) on the HH-RLHF dataset. Surprisingly, while training with high-resource languages improves model alignment, training in lower-resource languages yields minimal improvement. This suggests that the bottleneck of cross-lingual alignment is rooted in the pretraining stage. Our findings highlight the challenges in cross-lingual LLM safety, and we hope they inform future research in this direction.
- [1380] arXiv:2401.13138 (cross-list from cs.CY) [ pdf , other ]
-
Title: Visibility into AI AgentsAuthors: Alan Chan , Carson Ezell , Max Kaufmann , Kevin Wei , Lewis Hammond , Herbie Bradley , Emma Bluemke , Nitarshan Rajkumar , David Krueger , Noam Kolt , Lennart Heim , Markus AnderljungComments: Accepted to ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2024)Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Increased delegation of commercial, scientific, governmental, and personal activities to AI agents -- systems capable of pursuing complex goals with limited supervision -- may exacerbate existing societal risks and introduce new risks. Understanding and mitigating these risks involves critically evaluating existing governance structures, revising and adapting these structures where needed, and ensuring accountability of key stakeholders. Information about where, why, how, and by whom certain AI agents are used, which we refer to as visibility, is critical to these objectives. In this paper, we assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging. For each, we outline potential implementations that vary in intrusiveness and informativeness. We analyze how the measures apply across a spectrum of centralized through decentralized deployment contexts, accounting for various actors in the supply chain including hardware and software service providers. Finally, we discuss the implications of our measures for privacy and concentration of power. Further work into understanding the measures and mitigating their negative impacts can help to build a foundation for the governance of AI agents.
- [1381] arXiv:2401.13157 (cross-list from cs.LG) [ pdf , other ]
-
Title: Time-Aware Knowledge Representations of Dynamic Objects with Multidimensional PersistenceJournal-ref: AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Learning time-evolving objects such as multivariate time series and dynamic networks requires the development of novel knowledge representation mechanisms and neural network architectures, which allow for capturing implicit time-dependent information contained in the data. Such information is typically not directly observed but plays a key role in the learning task performance. In turn, lack of time dimension in knowledge encoding mechanisms for time-dependent data leads to frequent model updates, poor learning performance, and, as a result, subpar decision-making. Here we propose a new approach to a time-aware knowledge representation mechanism that notably focuses on implicit time-dependent topological information along multiple geometric dimensions. In particular, we propose a new approach, named \textit{Temporal MultiPersistence} (TMP), which produces multidimensional topological fingerprints of the data by using the existing single parameter topological summaries. The main idea behind TMP is to merge the two newest directions in topological representation learning, that is, multi-persistence which simultaneously describes data shape evolution along multiple key parameters, and zigzag persistence to enable us to extract the most salient data shape information over time. We derive theoretical guarantees of TMP vectorizations and show its utility, in application to forecasting on benchmark traffic flow, Ethereum blockchain, and electrocardiogram datasets, demonstrating the competitive performance, especially, in scenarios of limited data records. In addition, our TMP method improves the computational efficiency of the state-of-the-art multipersistence summaries up to 59.5 times.
- [1382] arXiv:2401.13171 (cross-list from cs.LG) [ pdf , other ]
-
Title: Compositional Generative Inverse DesignAuthors: Tailin Wu , Takashi Maruyama , Long Wei , Tao Zhang , Yilun Du , Gianluca Iaccarino , Jure LeskovecComments: ICLR 2024 spotlight. 30 pages, 17 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: Inverse design, where we seek to design input variables in order to optimize an underlying objective function, is an important problem that arises across fields such as mechanical engineering to aerospace engineering. Inverse design is typically formulated as an optimization problem, with recent works leveraging optimization across learned dynamics models. However, as models are optimized they tend to fall into adversarial modes, preventing effective sampling. We illustrate that by instead optimizing over the learned energy function captured by the diffusion model, we can avoid such adversarial examples and significantly improve design performance. We further illustrate how such a design system is compositional, enabling us to combine multiple different diffusion models representing subcomponents of our desired system to design systems with every specified component. In an N-body interaction task and a challenging 2D multi-airfoil design task, we demonstrate that by composing the learned diffusion model at test time, our method allows us to design initial states and boundary shapes that are more complex than those in the training data. Our method generalizes to more objects for N-body dataset and discovers formation flying to minimize drag in the multi-airfoil design task. Project website and code can be found at this https URL .
- [1383] arXiv:2401.13178 (cross-list from cs.CL) [ pdf , other ]
-
Title: AgentBoard: An Analytical Evaluation Board of Multi-turn LLM AgentsAuthors: Chang Ma , Junlei Zhang , Zhihao Zhu , Cheng Yang , Yujiu Yang , Yaohui Jin , Zhenzhong Lan , Lingpeng Kong , Junxian HeComments: PreprintSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Evaluating large language models (LLMs) as general-purpose agents is essential for understanding their capabilities and facilitating their integration into practical applications. However, the evaluation process presents substantial challenges. A primary obstacle is the benchmarking of agent performance across diverse scenarios within a unified framework, especially in maintaining partially-observable environments and ensuring multi-round interactions. Moreover, current evaluation frameworks mostly focus on the final success rate, revealing few insights during the process and failing to provide a deep understanding of the model abilities. To address these challenges, we introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents. AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit that features easy assessment of agents for multi-faceted analysis through interactive visualization. This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront. Ultimately, AgentBoard serves as a significant step towards demystifying agent behaviors and accelerating the development of stronger LLM agents.
- [1384] arXiv:2401.13193 (cross-list from cs.CV) [ pdf , other ]
-
Title: Catch-Up Mix: Catch-Up Class for Struggling Filters in CNNComments: Published at AAAI2024, Equal contribution of first two authorsJournal-ref: Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2024, 2705-2713Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Deep learning has made significant advances in computer vision, particularly in image classification tasks. Despite their high accuracy on training data, deep learning models often face challenges related to complexity and overfitting. One notable concern is that the model often relies heavily on a limited subset of filters for making predictions. This dependency can result in compromised generalization and an increased vulnerability to minor variations. While regularization techniques like weight decay, dropout, and data augmentation are commonly used to address this issue, they may not directly tackle the reliance on specific filters. Our observations reveal that the heavy reliance problem gets severe when slow-learning filters are deprived of learning opportunities due to fast-learning filters. Drawing inspiration from image augmentation research that combats over-reliance on specific image regions by removing and replacing parts of images, our idea is to mitigate the problem of over-reliance on strong filters by substituting highly activated features. To this end, we present a novel method called Catch-up Mix, which provides learning opportunities to a wide range of filters during training, focusing on filters that may lag behind. By mixing activation maps with relatively lower norms, Catch-up Mix promotes the development of more diverse representations and reduces reliance on a small subset of filters. Experimental results demonstrate the superiority of our method in various vision classification datasets, providing enhanced robustness.
- [1385] arXiv:2401.13201 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: MLLMReID: Multimodal Large Language Model-based Person Re-identificationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Multimodal large language models (MLLM) have achieved satisfactory results in many tasks. However, their performance in the task of person re-identification (ReID) has not been explored to date. This paper will investigate how to adapt them for the task of ReID. An intuitive idea is to fine-tune MLLM with ReID image-text datasets, and then use their visual encoder as a backbone for ReID. However, there still exist two apparent issues: (1) Designing instructions for ReID, MLLMs may overfit specific instructions, and designing a variety of instructions will lead to higher costs. (2) Latent image feature vectors from LLMs are not involved in loss computation. Instructional learning, aligning image-text features, results in indirect optimization and a learning objective that inadequately utilizes features, limiting effectiveness in person feature learning. To address these problems, this paper proposes MLLMReID: Multimodal Large Language Model-based ReID. Firstly, we proposed Common Instruction, a simple approach that leverages the essence ability of LLMs to continue writing, avoiding complex and diverse instruction design. Secondly, we proposed DirectReID, which effectively employs the latent image feature vectors of images outputted by LLMs in ReID tasks. The experimental results demonstrate the superiority of our method. We will open-source the code on GitHub.
- [1386] arXiv:2401.13205 (cross-list from cs.CV) [ pdf , other ]
-
Title: Boosting the Transferability of Adversarial Examples via Local Mixup and Adaptive Step SizeSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Adversarial examples are one critical security threat to various visual applications, where injected human-imperceptible perturbations can confuse the output.Generating transferable adversarial examples in the black-box setting is crucial but challenging in practice. Existing input-diversity-based methods adopt different image transformations, but may be inefficient due to insufficient input diversity and an identical perturbation step size. Motivated by the fact that different image regions have distinctive weights in classification, this paper proposes a black-box adversarial generative framework by jointly designing enhanced input diversity and adaptive step sizes. We design local mixup to randomly mix a group of transformed adversarial images, strengthening the input diversity. For precise adversarial generation, we project the perturbation into the $tanh$ space to relax the boundary constraint. Moreover, the step sizes of different regions can be dynamically adjusted by integrating a second-order momentum.Extensive experiments on ImageNet validate that our framework can achieve superior transferability compared to state-of-the-art baselines.
- [1387] arXiv:2401.13212 (cross-list from cs.CV) [ pdf , other ]
-
Title: AdCorDA: Classifier Refinement via Adversarial Correction and Domain AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper describes a simple yet effective technique for refining a pretrained classifier network. The proposed AdCorDA method is based on modification of the training set and making use of the duality between network weights and layer inputs. We call this input space training. The method consists of two stages - adversarial correction followed by domain adaptation. Adversarial correction uses adversarial attacks to correct incorrect training-set classifications. The incorrectly classified samples of the training set are removed and replaced with the adversarially corrected samples to form a new training set, and then, in the second stage, domain adaptation is performed back to the original training set. Extensive experimental validations show significant accuracy boosts of over 5% on the CIFAR-100 dataset. The technique can be straightforwardly applied to refinement of weight-quantized neural networks, where experiments show substantial enhancement in performance over the baseline. The adversarial correction technique also results in enhanced robustness to adversarial attacks.
- [1388] arXiv:2401.13214 (cross-list from cs.CV) [ pdf , other ]
-
Title: AMANet: Advancing SAR Ship Detection with Adaptive Multi-Hierarchical Attention NetworkComments: 11 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recently, methods based on deep learning have been successfully applied to ship detection for synthetic aperture radar (SAR) images. Despite the development of numerous ship detection methodologies, detecting small and coastal ships remains a significant challenge due to the limited features and clutter in coastal environments. For that, a novel adaptive multi-hierarchical attention module (AMAM) is proposed to learn multi-scale features and adaptively aggregate salient features from various feature layers, even in complex environments. Specifically, we first fuse information from adjacent feature layers to enhance the detection of smaller targets, thereby achieving multi-scale feature enhancement. Then, to filter out the adverse effects of complex backgrounds, we dissect the previously fused multi-level features on the channel, individually excavate the salient regions, and adaptively amalgamate features originating from different channels. Thirdly, we present a novel adaptive multi-hierarchical attention network (AMANet) by embedding the AMAM between the backbone network and the feature pyramid network (FPN). Besides, the AMAM can be readily inserted between different frameworks to improve object detection. Lastly, extensive experiments on two large-scale SAR ship detection datasets demonstrate that our AMANet method is superior to state-of-the-art methods.
- [1389] arXiv:2401.13223 (cross-list from cs.CL) [ pdf , other ]
-
Title: TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual DataComments: ACL 2024 (Under Review)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this work, we address question answering (QA) over a hybrid of tabular and textual data that are very common content on the Web (e.g. SEC filings), where discrete reasoning capabilities are often required. Recently, large language models (LLMs) like GPT-4 have demonstrated strong multi-step reasoning capabilities. We then consider harnessing the amazing power of LLMs to solve our task. We abstract a Step-wise Pipeline for tabular and textual QA, which consists of three key steps, including Extractor, Reasoner and Executor, and initially design an instruction to instantiate the pipeline and validate that GPT-4 outperforms all existing methods. However, utilizing an online LLM like GPT-4 holds various challenges in terms of cost, latency, and data security risk, which motivates us to specialize smaller LLMs in this task. We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets following the Step-wise Pipeline. The experimental results have verified that our TAT-LLM model can outperform all baseline models, including the previous best fine-tuned models and very large-scale LLMs like GPT-4 on FinQA, TAT-QA and TAT-DQA benchmarks.
- [1390] arXiv:2401.13227 (cross-list from cs.CL) [ pdf , other ]
-
Title: LPNL: Scalable Link Prediction with Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: Exploring the application of large language models (LLMs) to graph learning is a emerging endeavor. However, the vast amount of information inherent in large graphs poses significant challenges to this process. This work focuses on the link prediction task and introduces $\textbf{LPNL}$ (Link Prediction via Natural Language), a framework based on large language models designed for scalable link prediction on large-scale heterogeneous graphs. We design novel prompts for link prediction that articulate graph details in natural language. We propose a two-stage sampling pipeline to extract crucial information from the graphs, and a divide-and-conquer strategy to control the input tokens within predefined limits, addressing the challenge of overwhelming information. We fine-tune a T5 model based on our self-supervised learning designed for link prediction. Extensive experimental results demonstrate that LPNL outperforms multiple advanced baselines in link prediction tasks on large-scale graphs.
- [1391] arXiv:2401.13229 (cross-list from cs.CL) [ pdf , other ]
-
Title: From Random to Informed Data Selection: A Diversity-Based Approach to Optimize Human Annotation and Few-Shot LearningAuthors: Alexandre Alcoforado , Thomas Palmeira Ferraz , Lucas Hideki Okamura , Israel Campos Fama , Arnold Moya Lavado , Bárbara Dias Bueno , Bruno Veloso , Anna Helena Reali CostaComments: Accepted at PROPOR 2024 - The 16th International Conference on Computational Processing of PortugueseSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: A major challenge in Natural Language Processing is obtaining annotated data for supervised learning. An option is the use of crowdsourcing platforms for data annotation. However, crowdsourcing introduces issues related to the annotator's experience, consistency, and biases. An alternative is to use zero-shot methods, which in turn have limitations compared to their few-shot or fully supervised counterparts. Recent advancements driven by large language models show potential, but struggle to adapt to specialized domains with severely limited data. The most common approaches therefore involve the human itself randomly annotating a set of datapoints to build initial datasets. But randomly sampling data to be annotated is often inefficient as it ignores the characteristics of the data and the specific needs of the model. The situation worsens when working with imbalanced datasets, as random sampling tends to heavily bias towards the majority classes, leading to excessive annotated data. To address these issues, this paper contributes an automatic and informed data selection architecture to build a small dataset for few-shot learning. Our proposal minimizes the quantity and maximizes diversity of data selected for human annotation, while improving model performance.
- [1392] arXiv:2401.13256 (cross-list from cs.CL) [ pdf , other ]
-
Title: UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue SystemsAuthors: Hongru Wang , Wenyu Huang , Yang Deng , Rui Wang , Zezhong Wang , Yufei Wang , Fei Mi , Jeff Z. Pan , Kam-Fai WongSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks. However, the personalization issue still remains a much-coveted property, especially when it comes to the multiple sources involved in the dialogue system. To better plan and incorporate the use of multiple sources in generating personalized response, we firstly decompose it into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation. We then propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG) Specifically, we unify these three sub-tasks with different formulations into the same sequence-to-sequence paradigm during the training, to adaptively retrieve evidences and evaluate the relevance on-demand using special tokens, called acting tokens and evaluation tokens. Enabling language models to generate acting tokens facilitates interaction with various knowledge sources, allowing them to adapt their behavior to diverse task requirements. Meanwhile, evaluation tokens gauge the relevance score between the dialogue context and the retrieved evidence. In addition, we carefully design a self-refinement mechanism to iteratively refine the generated response considering 1) the consistency scores between the generated response and retrieved evidence; and 2) the relevance scores. Experiments on two personalized datasets (DuLeMon and KBP) show that UniMS-RAG achieves state-of-the-art performance on the knowledge source selection and response generation task with itself as a retriever in a unified manner. Extensive analyses and discussions are provided for shedding some new perspectives for personalized dialogue systems.
- [1393] arXiv:2401.13262 (cross-list from cs.GT) [ pdf , other ]
-
Title: Designing Redistribution Mechanisms for Reducing Transaction Fees in BlockchainsComments: Full Paper (AAMAS '24)Subjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Blockchains deploy Transaction Fee Mechanisms (TFMs) to determine which user transactions to include in blocks and determine their payments (i.e., transaction fees). Increasing demand and scarce block resources have led to high user transaction fees. As these blockchains are a public resource, it may be preferable to reduce these transaction fees. To this end, we introduce Transaction Fee Redistribution Mechanisms (TFRMs) -- redistributing VCG payments collected from such TFM as rebates to minimize transaction fees. Classic redistribution mechanisms (RMs) achieve this while ensuring Allocative Efficiency (AE) and User Incentive Compatibility (UIC). Our first result shows the non-triviality of applying RM in TFMs. More concretely, we prove that it is impossible to reduce transaction fees when (i) transactions that are not confirmed do not receive rebates and (ii) the miner can strategically manipulate the mechanism. Driven by this, we propose \emph{Robust} TFRM (\textsf{R-TFRM}): a mechanism that compromises on an honest miner's individual rationality to guarantee strictly positive rebates to the users. We then introduce \emph{robust} and \emph{rational} TFRM (\textsf{R}$^2$\textsf{-TFRM}) that uses trusted on-chain randomness that additionally guarantees miner's individual rationality (in expectation) and strictly positive rebates. Our results show that TFRMs provide a promising new direction for reducing transaction fees in public blockchains.
- [1394] arXiv:2401.13270 (cross-list from cs.CV) [ pdf , other ]
-
Title: Audio-Infused Automatic Image Colorization by Exploiting Audio Scene SemanticsAuthors: Pengcheng Zhao , Yanxiang Chen , Yang Zhao , Wei Jia , Zhao Zhang , Ronggang Wang , Richang HongSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Automatic image colorization is inherently an ill-posed problem with uncertainty, which requires an accurate semantic understanding of scenes to estimate reasonable colors for grayscale images. Although recent interaction-based methods have achieved impressive performance, it is still a very difficult task to infer realistic and accurate colors for automatic colorization. To reduce the difficulty of semantic understanding of grayscale scenes, this paper tries to utilize corresponding audio, which naturally contains extra semantic information about the same scene. Specifically, a novel audio-infused automatic image colorization (AIAIC) network is proposed, which consists of three stages. First, we take color image semantics as a bridge and pretrain a colorization network guided by color image semantics. Second, the natural co-occurrence of audio and video is utilized to learn the color semantic correlations between audio and visual scenes. Third, the implicit audio semantic representation is fed into the pretrained network to finally realize the audio-guided colorization. The whole process is trained in a self-supervised manner without human annotation. In addition, an audiovisual colorization dataset is established for training and testing. Experiments demonstrate that audio guidance can effectively improve the performance of automatic colorization, especially for some scenes that are difficult to understand only from visual modality.
- [1395] arXiv:2401.13275 (cross-list from cs.CL) [ pdf , other ]
-
Title: Can AI Assistants Know What They Don't Know?Authors: Qinyuan Cheng , Tianxiang Sun , Xiangyang Liu , Wenwei Zhang , Zhangyue Yin , Shimin Li , Linyang Li , Zhengfu He , Kai Chen , Xipeng QiuComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recently, AI assistants based on large language models (LLMs) show surprising performance in many tasks, such as dialogue, solving math problems, writing code, and using tools. Although LLMs possess intensive world knowledge, they still make factual errors when facing some knowledge intensive tasks, like open-domain question answering. These untruthful responses from the AI assistant may cause significant risks in practical applications. We believe that an AI assistant's refusal to answer questions it does not know is a crucial method for reducing hallucinations and making the assistant truthful. Therefore, in this paper, we ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment. Experimental results show that after alignment with Idk datasets, the assistant can refuse to answer most its unknown questions. For questions they attempt to answer, the accuracy is significantly higher than before the alignment.
- [1396] arXiv:2401.13282 (cross-list from cs.LG) [ pdf , other ]
-
Title: RefreshNet: Learning Multiscale Dynamics through Hierarchical RefreshingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Forecasting complex system dynamics, particularly for long-term predictions, is persistently hindered by error accumulation and computational burdens. This study presents RefreshNet, a multiscale framework developed to overcome these challenges, delivering an unprecedented balance between computational efficiency and predictive accuracy. RefreshNet incorporates convolutional autoencoders to identify a reduced order latent space capturing essential features of the dynamics, and strategically employs multiple recurrent neural network (RNN) blocks operating at varying temporal resolutions within the latent space, thus allowing the capture of latent dynamics at multiple temporal scales. The unique "refreshing" mechanism in RefreshNet allows coarser blocks to reset inputs of finer blocks, effectively controlling and alleviating error accumulation. This design demonstrates superiority over existing techniques regarding computational efficiency and predictive accuracy, especially in long-term forecasting. The framework is validated using three benchmark applications: the FitzHugh-Nagumo system, the Reaction-Diffusion equation, and Kuramoto-Sivashinsky dynamics. RefreshNet significantly outperforms state-of-the-art methods in long-term forecasting accuracy and speed, marking a significant advancement in modeling complex systems and opening new avenues in understanding and predicting their behavior.
- [1397] arXiv:2401.13298 (cross-list from cs.CL) [ pdf , other ]
-
Title: Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language ModelsComments: The first work towards explainable harmful meme detection by harnessing advanced LLMsJournal-ref: The ACM Web Conference 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones. This task presents a significant challenge due to the implicit meaning embedded in memes, which is not explicitly conveyed through the surface text and image. However, existing harmful meme detection methods do not present readable explanations that unveil such implicit meaning to support their detection decisions. In this paper, we propose an explainable approach to detect harmful memes, achieved through reasoning over conflicting rationales from both harmless and harmful positions. Specifically, inspired by the powerful capacity of Large Language Models (LLMs) on text generation and reasoning, we first elicit multimodal debate between LLMs to generate the explanations derived from the contradictory arguments. Then we propose to fine-tune a small language model as the debate judge for harmfulness inference, to facilitate multimodal fusion between the harmfulness rationales and the intrinsic multimodal information within memes. In this way, our model is empowered to perform dialectical reasoning over intricate and implicit harm-indicative patterns, utilizing multimodal explanations originating from both harmless and harmful arguments. Extensive experiments on three public meme datasets demonstrate that our harmful meme detection approach achieves much better performance than state-of-the-art methods and exhibits a superior capacity for explaining the meme harmfulness of the model predictions.
- [1398] arXiv:2401.13311 (cross-list from cs.CV) [ pdf , other ]
-
Title: ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent advancements in AI have led to the development of large multimodal models (LMMs) capable of processing complex tasks involving joint reasoning over text and visual content in the image (e.g., navigating maps in public places). This paper introduces ConTextual, a novel benchmark comprising instructions designed explicitly to evaluate LMMs' ability to perform context-sensitive text-rich visual reasoning. ConTextual emphasizes diverse real-world scenarios (e.g., time-reading, navigation, shopping and more) demanding a deeper understanding of the interactions between textual and visual elements. Our findings reveal a significant performance gap of 30.8% between the best-performing LMM, GPT-4V(ision), and human capabilities using human evaluation indicating substantial room for improvement in context-sensitive text-rich visual reasoning. Notably, while GPT-4V excelled in abstract categories like meme and quote interpretation, its overall performance still lagged behind humans. In addition to human evaluations, we also employed automatic evaluation metrics using GPT-4, uncovering similar trends in performance disparities. We also perform a fine-grained evaluation across diverse visual contexts and provide qualitative analysis which provides a robust framework for future advancements in the LMM design. this https URL
- [1399] arXiv:2401.13324 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Information That Matters: Exploring Information Needs of People Affected by Algorithmic DecisionsComments: Main text: 21 pages, 3 figures. Supplementary material is provided. Manuscript submitted for review to IJHCSSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Explanations of AI systems rarely address the information needs of people affected by algorithmic decision-making (ADM). This gap between conveyed information and information that matters to affected stakeholders can impede understanding and adherence to regulatory frameworks such as the AI Act. To address this gap, we present the "XAI Novice Question Bank": A catalog of affected stakeholders' information needs in two ADM use cases (employment prediction and health monitoring), covering the categories data, system context, system usage, and system specifications. Information needs were gathered in an interview study where participants received explanations in response to their inquiries. Participants further reported their understanding and decision confidence, showing that while confidence tended to increase after receiving explanations, participants also met understanding challenges, such as being unable to tell why their understanding felt incomplete. Explanations further influenced participants' perceptions of the systems' risks and benefits, which they confirmed or changed depending on the use case. When risks were perceived as high, participants expressed particular interest in explanations about intention, such as why and to what end a system was put in place. With this work, we aim to support the inclusion of affected stakeholders into explainability by contributing an overview of information and challenges relevant to them when deciding on the adoption of ADM systems. We close by summarizing our findings in a list of six key implications that inform the design of future explanations for affected stakeholder audiences.
- [1400] arXiv:2401.13334 (cross-list from cs.LG) [ pdf , other ]
-
Title: Explainable Bayesian OptimizationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In industry, Bayesian optimization (BO) is widely applied in the human-AI collaborative parameter tuning of cyber-physical systems. However, BO's solutions may deviate from human experts' actual goal due to approximation errors and simplified objectives, requiring subsequent tuning. The black-box nature of BO limits the collaborative tuning process because the expert does not trust the BO recommendations. Current explainable AI (XAI) methods are not tailored for optimization and thus fall short of addressing this gap. To bridge this gap, we propose TNTRules (TUNE-NOTUNE Rules), a post-hoc, rule-based explainability method that produces high quality explanations through multiobjective optimization. Our evaluation of benchmark optimization problems and real-world hyperparameter optimization tasks demonstrates TNTRules' superiority over state-of-the-art XAI methods in generating high quality explanations. This work contributes to the intersection of BO and XAI, providing interpretable optimization techniques for real-world applications.
- [1401] arXiv:2401.13346 (cross-list from cs.NI) [ pdf , other ]
-
Title: Past, Present, Future: A Comprehensive Exploration of AI Use Cases in the UMBRELLA IoT TestbedComments: 6 pgaes, 4 figures. This work has been accepted by PerCom TrustSense workshop 2024Subjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: UMBRELLA is a large-scale, open-access Internet of Things (IoT) ecosystem incorporating over 200 multi-sensor multi-wireless nodes, 20 collaborative robots, and edge-intelligence-enabled devices. This paper provides a guide to the implemented and prospective artificial intelligence (AI) capabilities of UMBRELLA in real-world IoT systems. Four existing UMBRELLA applications are presented in detail: 1) An automated streetlight monitoring for detecting issues and triggering maintenance alerts; 2) A Digital twin of building environments providing enhanced air quality sensing with reduced cost; 3) A large-scale Federated Learning framework for reducing communication overhead; and 4) An intrusion detection for containerised applications identifying malicious activities. Additionally, the potential of UMBRELLA is outlined for future smart city and multi-robot crowdsensing applications enhanced by semantic communications and multi-agent planning. Finally, to realise the above use-cases we discuss the need for a tailored MLOps platform to automate UMBRELLA model pipelines and establish trust.
- [1402] arXiv:2401.13432 (cross-list from cs.CV) [ pdf , other ]
-
Title: Semi-Supervised Coupled Thin-Plate Spline Model for Rotation Correction and BeyondSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Thin-plate spline (TPS) is a principal warp that allows for representing elastic, nonlinear transformation with control point motions. With the increase of control points, the warp becomes increasingly flexible but usually encounters a bottleneck caused by undesired issues, e.g., content distortion. In this paper, we explore generic applications of TPS in single-image-based warping tasks, such as rotation correction, rectangling, and portrait correction. To break this bottleneck, we propose the coupled thin-plate spline model (CoupledTPS), which iteratively couples multiple TPS with limited control points into a more flexible and powerful transformation. Concretely, we first design an iterative search to predict new control points according to the current latent condition. Then, we present the warping flow as a bridge for the coupling of different TPS transformations, effectively eliminating interpolation errors caused by multiple warps. Besides, in light of the laborious annotation cost, we develop a semi-supervised learning scheme to improve warping quality by exploiting unlabeled data. It is formulated through dual transformation between the searched control points of unlabeled data and its graphic augmentation, yielding an implicit correction consistency constraint. Finally, we collect massive unlabeled data to exhibit the benefit of our semi-supervised scheme in rotation correction. Extensive experiments demonstrate the superiority and universality of CoupledTPS over the existing state-of-the-art (SoTA) solutions for rotation correction and beyond. The code and data will be available at this https URL .
- [1403] arXiv:2401.13444 (cross-list from cs.CL) [ pdf , other ]
-
Title: Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource ConsumptionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In recent times, large language models (LLMs) have showcased remarkable capabilities. However, updating their knowledge poses challenges, potentially leading to inaccuracies when confronted with unfamiliar queries. While integrating knowledge graphs with LLMs has been explored, existing approaches treat LLMs as primary decision-makers, imposing high demands on their capabilities. This is particularly unsuitable for LLMs with lower computational costs and relatively poorer performance. In this paper, we introduce a Clue-Guided Path Exploration framework (CGPE) that efficiently merges a knowledge base with an LLM, placing less stringent requirements on the model's capabilities. Inspired by the method humans use to manually retrieve knowledge, CGPE employs information from the question as clues to systematically explore the required knowledge path within the knowledge base. Experiments on open-source datasets reveal that CGPE outperforms previous methods and is highly applicable to LLMs with fewer parameters. In some instances, even ChatGLM3, with its 6 billion parameters, can rival the performance of GPT-4. Furthermore, the results indicate a minimal invocation frequency of CGPE on LLMs, suggesting reduced computational overhead. For organizations and individuals facing constraints in computational resources, our research offers significant practical value.
- [1404] arXiv:2401.13460 (cross-list from cs.LG) [ pdf , other ]
-
Title: Multi-Agent Diagnostics for Robustness via Illuminated DiversitySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.
- [1405] arXiv:2401.13462 (cross-list from cs.RO) [ pdf , other ]
-
Title: Growing from Exploration: A self-exploring framework for robots based on foundation modelsComments: 19 pagesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Intelligent robot is the ultimate goal in the robotics field. Existing works leverage learning-based or optimization-based methods to accomplish human-defined tasks. However, the challenge of enabling robots to explore various environments autonomously remains unresolved. In this work, we propose a framework named GExp, which enables robots to explore and learn autonomously without human intervention. To achieve this goal, we devise modules including self-exploration, knowledge-base-building, and close-loop feedback based on foundation models. Inspired by the way that infants interact with the world, GExp encourages robots to understand and explore the environment with a series of self-generated tasks. During the process of exploration, the robot will acquire skills from beneficial experiences that are useful in the future. GExp provides robots with the ability to solve complex tasks through self-exploration. GExp work is independent of prior interactive knowledge and human intervention, allowing it to adapt directly to different scenarios, unlike previous studies that provided in-context examples as few-shot learning. In addition, we propose a workflow of deploying the real-world robot system with self-learned skills as an embodied assistant.
- [1406] arXiv:2401.13481 (cross-list from cs.CY) [ pdf , other ]
-
Title: How AI Ideas Affect the Creativity, Diversity, and Evolution of Human Ideas: Evidence From a Large, Dynamic ExperimentSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: Exposure to large language model output is rapidly increasing. How will seeing AI-generated ideas affect human ideas? We conducted an experiment (800+ participants, 40+ countries) where participants viewed creative ideas that were from ChatGPT or prior experimental participants and then brainstormed their own idea. We varied the number of AI-generated examples (none, low, or high exposure) and if the examples were labeled as 'AI' (disclosure). Our dynamic experiment design -- ideas from prior participants in an experimental condition are used as stimuli for future participants in the same experimental condition -- mimics the interdependent process of cultural creation: creative ideas are built upon prior ideas. Hence, we capture the compounding effects of having LLMs 'in the culture loop'. We find that high AI exposure (but not low AI exposure) did not affect the creativity of individual ideas but did increase the average amount and rate of change of collective idea diversity. AI made ideas different, not better. There were no main effects of disclosure. We also found that self-reported creative people were less influenced by knowing an idea was from AI, and that participants were more likely to knowingly adopt AI ideas when the task was difficult. Our findings suggest that introducing AI ideas into society may increase collective diversity but not individual creativity.
- [1407] arXiv:2401.13486 (cross-list from math.NA) [ pdf , other ]
-
Title: Separable Physics-Informed Neural Networks for the solution of elasticity problemsAuthors: Vasiliy A. Es'kin , Danil V. Davydov , Julia V. Gur'eva , Alexey O. Malkhanov , Mikhail E. SmorkalovSubjects: Numerical Analysis (math.NA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Applied Physics (physics.app-ph)
Abstract: A method for solving elasticity problems based on separable physics-informed neural networks (SPINN) in conjunction with the deep energy method (DEM) is presented. Numerical experiments have been carried out for a number of problems showing that this method has a significantly higher convergence rate and accuracy than the vanilla physics-informed neural networks (PINN) and even SPINN based on a system of partial differential equations (PDEs). In addition, using the SPINN in the framework of DEM approach it is possible to solve problems of the linear theory of elasticity on complex geometries, which is unachievable with the help of PINNs in frames of partial differential equations. Considered problems are very close to the industrial problems in terms of geometry, loading, and material parameters.
- [1408] arXiv:2401.13498 (cross-list from cs.SD) [ pdf , other ]
-
Title: Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion OutpaintingComments: Accepted to ICASSP 2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Abstract: Synthesizing performing guitar sound is a highly challenging task due to the polyphony and high variability in expression. Recently, deep generative models have shown promising results in synthesizing expressive polyphonic instrument sounds from music scores, often using a generic MIDI input. In this work, we propose an expressive acoustic guitar sound synthesis model with a customized input representation to the instrument, which we call guitarroll. We implement the proposed approach using diffusion-based outpainting which can generate audio with long-term consistency. To overcome the lack of MIDI/audio-paired datasets, we used not only an existing guitar dataset but also collected data from a high quality sample-based guitar synthesizer. Through quantitative and qualitative evaluations, we show that our proposed model has higher audio quality than the baseline model and generates more realistic timbre sounds than the previous leading work.
- [1409] arXiv:2401.13555 (cross-list from cs.CV) [ pdf , other ]
-
Title: Benchmarking the Fairness of Image Upsampling MethodsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent years have witnessed a rapid development of deep generative models for creating synthetic media, such as images and videos. While the practical applications of these models in everyday tasks are enticing, it is crucial to assess the inherent risks regarding their fairness. In this work, we introduce a comprehensive framework for benchmarking the performance and fairness of conditional generative models. We develop a set of metrics$\unicode{x2013}$inspired by their supervised fairness counterparts$\unicode{x2013}$to evaluate the models on their fairness and diversity. Focusing on the specific application of image upsampling, we create a benchmark covering a wide variety of modern upsampling methods. As part of the benchmark, we introduce UnfairFace, a subset of FairFace that replicates the racial distribution of common large-scale face datasets. Our empirical study highlights the importance of using an unbiased training set and reveals variations in how the algorithms respond to dataset imbalances. Alarmingly, we find that none of the considered methods produces statistically fair and diverse results.
- [1410] arXiv:2401.13586 (cross-list from cs.LG) [ pdf , other ]
-
Title: Instruction Fine-Tuning: Does Prompt Loss Matter?Authors: Mathew Huerta-EnochianComments: 8 pages of content. 12 pages supporting. 30 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We present a study analyzing the effects of prompt loss weighting (PLW) on supervised instruction fine-tuning. We recreated Stanford's Alpaca experiment with both LLaMA 1 and LLaMA 2 and multiple instruction datasets. We found that performance of models fine-tuned on our short-completion dataset had a statistically significant negative quadratic relationship with PLW, but performance of models fine-tuned on medium- and long-completion data did not show any relationship with PLW. I.e., prompt loss can be safely ignored for many datasets. For short-completion data, small values (0.01-0.1) of PLW were optimal for multiple-choice and short-generation tasks while large values (~ 1.0) of PLW were optimal for long-generation tasks. We concluded that low non-zero PLW encourages models to not diverge from pre-trained model weights during training and high PLW reduces overfitting. Finally, we present a rough guide for selecting PLW values based on the completion-prompt length ratio of fine-tuning data.
- [1411] arXiv:2401.13588 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record NotesAuthors: Darren Liu , Cheng Ding , Delgersuren Bold , Monique Bouvier , Jiaying Lu , Benjamin Shickel , Craig S. Jabaley , Wenhui Zhang , Soojin Park , Michael J. Young , Mark S. Wainwright , Gilles Clermont , Parisa Rashidi , Eric S. Rosenthal , Laurie Dimisko , Ran Xiao , Joo Heung Yoon , Carl Yang , Xiao HuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was evaluated by identifying the temporality and negation of these concepts using different prompts for an in-depth analysis. Results: GPT-4 showed overall superior performance compared to other LLMs. In contrast, both GPT-3.5 and text-davinci-003 exhibit enhanced performance when the appropriate prompting strategies are employed. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities. Conclusion: A comprehensive qualitative performance evaluation framework for LLMs is developed and operationalized. This framework goes beyond singular performance aspects. With expert annotations, this methodology not only validates LLMs' capabilities in processing complex medical data but also establishes a benchmark for future LLM evaluations across specialized domains.
- [1412] arXiv:2401.13594 (cross-list from cs.CL) [ pdf , other ]
-
Title: Graph Guided Question Answer Generation for Procedural Question-AnsweringAuthors: Hai X. Pham , Isma Hadji , Xinnuo Xu , Ziedune Degutyte , Jay Rainey , Evangelos Kazakos , Afsaneh Fazly , Georgios Tzimiropoulos , Brais MartinezComments: Accepted to EACL 2024 as long paper. 25 pages including appendixSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data. While current QA data generation methods can produce well-formed and varied data, their non-exhaustive nature is sub-optimal for training a QA model. In contrast, we leverage the highly structured aspect of procedural text and represent each step and the overall flow of the procedure as graphs. We then condition on graph nodes to automatically generate QA pairs in an exhaustive and controllable manner. Comprehensive evaluations of our method show that: 1) small models trained with our data achieve excellent performance on the target QA task, even exceeding that of GPT3 and ChatGPT despite being several orders of magnitude smaller. 2) semantic coverage is the key indicator for downstream QA performance. Crucially, while large language models excel at syntactic diversity, this does not necessarily result in improvements on the end QA model. In contrast, the higher semantic coverage provided by our method is critical for QA performance.
- [1413] arXiv:2401.13611 (cross-list from cs.SD) [ pdf , ps , other ]
-
Title: Non-Intrusive Speech Intelligibility Prediction for Hearing-Impaired Users using Intermediate ASR Features and Human Memory ModelsAuthors: Rhiannon Mogridge , George Close , Robert Sutherland , Thomas Hain , Jon Barker , Stefan Goetze , Anton RagniComments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Neural networks have been successfully used for non-intrusive speech intelligibility prediction. Recently, the use of feature representations sourced from intermediate layers of pre-trained self-supervised and weakly-supervised models has been found to be particularly useful for this task. This work combines the use of Whisper ASR decoder layer representations as neural network input features with an exemplar-based, psychologically motivated model of human memory to predict human intelligibility ratings for hearing-aid users. Substantial performance improvement over an established intrusive HASPI baseline system is found, including on enhancement systems and listeners unseen in the training data, with a root mean squared error of 25.3 compared with the baseline of 28.7.
- [1414] arXiv:2401.13613 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Enhancing Image Retrieval : A Comprehensive Study on Photo Search using the CLIP ModeSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Photo search, the task of retrieving images based on textual queries, has witnessed significant advancements with the introduction of CLIP (Contrastive Language-Image Pretraining) model. CLIP leverages a vision-language pre training approach, wherein it learns a shared representation space for images and text, enabling cross-modal understanding. This model demonstrates the capability to understand the semantic relationships between diverse image and text pairs, allowing for efficient and accurate retrieval of images based on natural language queries. By training on a large-scale dataset containing images and their associated textual descriptions, CLIP achieves remarkable generalization, providing a powerful tool for tasks such as zero-shot learning and few-shot classification. This abstract summarizes the foundational principles of CLIP and highlights its potential impact on advancing the field of photo search, fostering a seamless integration of natural language understanding and computer vision for improved information retrieval in multimedia applications
- [1415] arXiv:2401.13641 (cross-list from cs.CV) [ pdf , other ]
-
Title: How Good is ChatGPT at Face Biometrics? A First Look into Recognition, Soft Biometrics, and ExplainabilityAuthors: Ivan DeAndres-Tame , Ruben Tolosana , Ruben Vera-Rodriguez , Aythami Morales , Julian Fierrez , Javier Ortega-GarciaJournal-ref: IEEE Access, February 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) such as GPT developed by OpenAI, have already shown astonishing results, introducing quick changes in our society. This has been intensified by the release of ChatGPT which allows anyone to interact in a simple conversational way with LLMs, without any experience in the field needed. As a result, ChatGPT has been rapidly applied to many different tasks such as code- and song-writer, education, virtual assistants, etc., showing impressive results for tasks for which it was not trained (zero-shot learning).
The present study aims to explore the ability of ChatGPT, based on the recent GPT-4 multimodal LLM, for the task of face biometrics. In particular, we analyze the ability of ChatGPT to perform tasks such as face verification, soft-biometrics estimation, and explainability of the results. ChatGPT could be very valuable to further increase the explainability and transparency of automatic decisions in human scenarios. Experiments are carried out in order to evaluate the performance and robustness of ChatGPT, using popular public benchmarks and comparing the results with state-of-the-art methods in the field. The results achieved in this study show the potential of LLMs such as ChatGPT for face biometrics, especially to enhance explainability. For reproducibility reasons, we release all the code in GitHub. - [1416] arXiv:2401.13652 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph-Informed Neural Networks for Sparse Grid-Based Discontinuity DetectorsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Numerical Analysis (math.NA)
Abstract: In this paper, we present a novel approach for detecting the discontinuity interfaces of a discontinuous function. This approach leverages Graph-Informed Neural Networks (GINNs) and sparse grids to address discontinuity detection also in domains of dimension larger than 3. GINNs, trained to identify troubled points on sparse grids, exploit graph structures built on the grids to achieve efficient and accurate discontinuity detection performances. We also introduce a recursive algorithm for general sparse grid-based detectors, characterized by convergence properties and easy applicability. Numerical experiments on functions with dimensions n = 2 and n = 4 demonstrate the efficiency and robust generalization of GINNs in detecting discontinuity interfaces. Notably, the trained GINNs offer portability and versatility, allowing integration into various algorithms and sharing among users.
- [1417] arXiv:2401.13657 (cross-list from cs.LG) [ pdf , other ]
-
Title: Inadequacy of common stochastic neural networks for reliable clinical decision supportComments: Keywords: probabilistic inference, uncertainty estimation, uncertainty quantification, epistemic uncertainty, clinical prognosis, electronic health recordsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Widespread adoption of AI for medical decision making is still hindered due to ethical and safety-related concerns. For AI-based decision support systems in healthcare settings it is paramount to be reliable and trustworthy. Common deep learning approaches, however, have the tendency towards overconfidence under data shift. Such inappropriate extrapolation beyond evidence-based scenarios may have dire consequences. This highlights the importance of reliable estimation of local uncertainty and its communication to the end user. While stochastic neural networks have been heralded as a potential solution to these issues, this study investigates their actual reliability in clinical applications. We centered our analysis on the exemplary use case of mortality prediction for ICU hospitalizations using EHR from MIMIC3 study. For predictions on the EHR time series, Encoder-Only Transformer models were employed. Stochasticity of model functions was achieved by incorporating common methods such as Bayesian neural network layers and model ensembles. Our models achieve state of the art performance in terms of discrimination performance (AUC ROC: 0.868+-0.011, AUC PR: 0.554+-0.034) and calibration on the mortality prediction benchmark. However, epistemic uncertainty is critically underestimated by the selected stochastic deep learning methods. A heuristic proof for the responsible collapse of the posterior distribution is provided. Our findings reveal the inadequacy of commonly used stochastic deep learning approaches to reliably recognize OoD samples. In both methods, unsubstantiated model confidence is not prevented due to strongly biased functional posteriors, rendering them inappropriate for reliable clinical decision support. This highlights the need for approaches with more strictly enforced or inherent distance-awareness to known data points, e.g., using kernel-based techniques.
- [1418] arXiv:2401.13662 (cross-list from cs.LG) [ pdf , other ]
-
Title: The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and ImplementationsAuthors: Matthias LehmannSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, various powerful policy gradient algorithms have been proposed in deep reinforcement learning. While all these algorithms build on the Policy Gradient Theorem, the specific design choices differ significantly across algorithms. We provide a holistic overview of on-policy policy gradient algorithms to facilitate the understanding of both their theoretical foundations and their practical implementations. In this overview, we include a detailed proof of the continuous version of the Policy Gradient Theorem, convergence results and a comprehensive discussion of practical algorithms. We compare the most prominent algorithms on continuous control environments and provide insights on the benefits of regularization. All code is available at this https URL .
- [1419] arXiv:2401.13672 (cross-list from cs.DB) [ pdf , other ]
-
Title: Transforming Agriculture with Intelligent Data Management and InsightsSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Modern agriculture faces grand challenges to meet increased demands for food, fuel, feed, and fiber with population growth under the constraints of climate change and dwindling natural resources. Data innovation is urgently required to secure and improve the productivity, sustainability, and resilience of our agroecosystems. As various sensors and Internet of Things (IoT) instrumentation become more available, affordable, reliable, and stable, it has become possible to conduct data collection, integration, and analysis at multiple temporal and spatial scales, in real-time, and with high resolutions. At the same time, the sheer amount of data poses a great challenge to data storage and analysis, and the \textit{de facto} data management and analysis practices adopted by scientists have become increasingly inefficient. Additionally, the data generated from different disciplines, such as genomics, phenomics, environment, agronomy, and socioeconomic, can be highly heterogeneous. That is, datasets across disciplines often do not share the same ontology, modality, or format. All of the above make it necessary to design a new data management infrastructure that implements the principles of Findable, Accessible, Interoperable, and Reusable (FAIR). In this paper, we propose Agriculture Data Management and Analytics (ADMA), which satisfies the FAIR principles. Our new data management infrastructure is intelligent by supporting semantic data management across disciplines, interactive by providing various data management/analysis portals such as web GUI, command line, and API, scalable by utilizing the power of high-performance computing (HPC), extensible by allowing users to load their own data analysis tools, trackable by keeping track of different operations on each file, and open by using a rich set of mature open source technologies.
- [1420] arXiv:2401.13677 (cross-list from cs.DB) [ pdf , other ]
-
Title: Process Mining for Unstructured Data: Challenges and Research DirectionsAuthors: Agnes Koschmider , Milda Aleknonytė-Resch , Frederik Fonger , Christian Imenkamp , Arvid Lepsien , Kaan Apaydin , Maximilian Harms , Dominik Janssen , Dominic Langhammer , Tobias Ziolkowski , Yorck ZisgenSubjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The application of process mining for unstructured data might significantly elevate novel insights into disciplines where unstructured data is a common data format. To efficiently analyze unstructured data by process mining and to convey confidence into the analysis result, requires bridging multiple challenges. The purpose of this paper is to discuss these challenges, present initial solutions and describe future research directions. We hope that this article lays the foundations for future collaboration on this topic.
- [1421] arXiv:2401.13693 (cross-list from cs.OH) [ pdf , other ]
-
Title: Challenge design roadmapAuthors: Hugo Jair Escalante Balderas , Isabelle Guyon (LISN, TAU), Addison Howard , Walter Reade , Sebastien Treguer (TAU)Journal-ref: AI Competitions and Benchmarks: The Science Behind the Contests, In pressSubjects: Other Computer Science (cs.OH) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Challenges can be seen as a type of game that motivates participants to solve serious tasks. As a result, competition organizers must develop effective game rules. However, these rules have multiple objectives beyond making the game enjoyable for participants. These objectives may include solving real-world problems, advancing scientific or technical areas, making scientific discoveries, and educating the public. In many ways, creating a challenge is similar to launching a product. It requires the same level of excitement and rigorous testing, and the goal is to attract ''customers'' in the form of participants. The process begins with a solid plan, such as a competition proposal that will eventually be submitted to an international conference and subjected to peer review. Although peer review does not guarantee quality, it does force organizers to consider the impact of their challenge, identify potential oversights, and generally improve its quality. This chapter provides guidelines for creating a strong plan for a challenge. The material draws on the preparation guidelines from organizations such as Kaggle 1 , ChaLearn 2 and Tailor 3 , as well as the NeurIPS proposal template, which some of the authors contributed to.
- [1422] arXiv:2401.13697 (cross-list from cs.CV) [ pdf , other ]
-
Title: Toward Robust Multimodal Learning using Multimodal Foundational ModelsComments: Under ReviewSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Existing multimodal sentiment analysis tasks are highly rely on the assumption that the training and test sets are complete multimodal data, while this assumption can be difficult to hold: the multimodal data are often incomplete in real-world scenarios. Therefore, a robust multimodal model in scenarios with randomly missing modalities is highly preferred. Recently, CLIP-based multimodal foundational models have demonstrated impressive performance on numerous multimodal tasks by learning the aligned cross-modal semantics of image and text pairs, but the multimodal foundational models are also unable to directly address scenarios involving modality absence. To alleviate this issue, we propose a simple and effective framework, namely TRML, Toward Robust Multimodal Learning using Multimodal Foundational Models. TRML employs generated virtual modalities to replace missing modalities, and aligns the semantic spaces between the generated and missing modalities. Concretely, we design a missing modality inference module to generate virtual modaliites and replace missing modalities. We also design a semantic matching learning module to align semantic spaces generated and missing modalities. Under the prompt of complete modality, our model captures the semantics of missing modalities by leveraging the aligned cross-modal semantic space. Experiments demonstrate the superiority of our approach on three multimodal sentiment analysis benchmark datasets, CMU-MOSI, CMU-MOSEI, and MELD.
- [1423] arXiv:2401.13699 (cross-list from cs.HC) [ pdf , other ]
-
Title: Generative AI-Driven Human Digital Twin in IoT-Healthcare: A Comprehensive SurveySubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The Internet of things (IoT) can significantly enhance the quality of human life, specifically in healthcare, attracting extensive attentions to IoT-healthcare services. Meanwhile, the human digital twin (HDT) is proposed as an innovative paradigm that can comprehensively characterize the replication of the individual human body in the digital world and reflect its physical status in real time. Naturally, HDT is envisioned to empower IoT-healthcare beyond the application of healthcare monitoring by acting as a versatile and vivid human digital testbed, simulating the outcomes and guiding the practical treatments. However, successfully establishing HDT requires high-fidelity virtual modeling and strong information interactions but possibly with scarce, biased and noisy data. Fortunately, a recent popular technology called generative artificial intelligence (GAI) may be a promising solution because it can leverage advanced AI algorithms to automatically create, manipulate, and modify valuable while diverse data. This survey particularly focuses on the implementation of GAI-driven HDT in IoT-healthcare. We start by introducing the background of IoT-healthcare and the potential of GAI-driven HDT. Then, we delve into the fundamental techniques and present the overall framework of GAI-driven HDT. After that, we explore the realization of GAI-driven HDT in detail, including GAI-enabled data acquisition, communication, data management, digital modeling, and data analysis. Besides, we discuss typical IoT-healthcare applications that can be revolutionized by GAI-driven HDT, namely personalized health monitoring and diagnosis, personalized prescription, and personalized rehabilitation. Finally, we conclude this survey by highlighting some future research directions.
- [1424] arXiv:2401.13700 (cross-list from cs.LO) [ pdf , other ]
-
Title: Towards Automated Readable Proofs of Ruler and Compass ConstructionsAuthors: Vesna Marinković (Faculty of Mathematics, University of Belgrade), Tijana Šukilović (Faculty of Mathematics, University of Belgrade), Filip Marić (Faculty of Mathematics, University of Belgrade)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 11-20Subjects: Logic in Computer Science (cs.LO) ; Artificial Intelligence (cs.AI)
Abstract: Although there are several systems that successfully generate construction steps for ruler and compass construction problems, none of them provides readable synthetic correctness proofs for generated constructions. In the present work, we demonstrate how our triangle construction solver ArgoTriCS can cooperate with automated theorem provers for first order logic and coherent logic so that it generates construction correctness proofs, that are both human-readable and formal (can be checked by interactive theorem provers such as Coq or Isabelle/HOL). These proofs currently rely on many high-level lemmas and our goal is to have them all formally shown from the basic axioms of geometry.
- [1425] arXiv:2401.13704 (cross-list from cs.CY) [ pdf , other ]
-
Title: Using Java Geometry Expert as Guide in the Preparations for Math ContestsAuthors: Ines Ganglmayr (The Private University College of Education of the Diocese of Linz, Austria), Zoltán Kovács (The Private University College of Education of the Diocese of Linz, Austria)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 124-131Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Symbolic Computation (cs.SC)
Abstract: We give an insight into Java Geometry Expert (JGEX) in use in a school context, focusing on the Austrian school system. JGEX can offer great support in some classroom situations, especially for solving mathematical competition tasks. Also, we discuss some limitations of the program.
- [1426] arXiv:2401.13708 (cross-list from cs.HC) [ pdf , other ]
-
Title: Accelerating hyperbolic t-SNEAuthors: Martin Skrodzki , Hunter van Geffen , Nicolas F. Chaves-de-Plaza , Thomas Höllt , Elmar Eisemann , Klaus HildebrandtSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Abstract: The need to understand the structure of hierarchical or high-dimensional data is present in a variety of fields. Hyperbolic spaces have proven to be an important tool for embedding computations and analysis tasks as their non-linear nature lends itself well to tree or graph data. Subsequently, they have also been used in the visualization of high-dimensional data, where they exhibit increased embedding performance. However, none of the existing dimensionality reduction methods for embedding into hyperbolic spaces scale well with the size of the input data. That is because the embeddings are computed via iterative optimization schemes and the computation cost of every iteration is quadratic in the size of the input. Furthermore, due to the non-linear nature of hyperbolic spaces, Euclidean acceleration structures cannot directly be translated to the hyperbolic setting. This paper introduces the first acceleration structure for hyperbolic embeddings, building upon a polar quadtree. We compare our approach with existing methods and demonstrate that it computes embeddings of similar quality in significantly less time. Implementation and scripts for the experiments can be found at this https URL .
- [1427] arXiv:2401.13713 (cross-list from cs.LG) [ pdf , other ]
-
Title: EMP: Effective Multidimensional Persistence for Graph Representation LearningAuthors: Ignacio Segovia-Dominguez , Yuzhou Chen , Cuneyt G. Akcora , Zhiwei Zhen , Murat Kantarcioglu , Yulia R. Gel , Baris CoskunuzerComments: arXiv admin note: text overlap with arXiv:2401.13157Journal-ref: LoG 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Geometry (cs.CG)
Abstract: Topological data analysis (TDA) is gaining prominence across a wide spectrum of machine learning tasks that spans from manifold learning to graph classification. A pivotal technique within TDA is persistent homology (PH), which furnishes an exclusive topological imprint of data by tracing the evolution of latent structures as a scale parameter changes. Present PH tools are confined to analyzing data through a single filter parameter. However, many scenarios necessitate the consideration of multiple relevant parameters to attain finer insights into the data. We address this issue by introducing the Effective Multidimensional Persistence (EMP) framework. This framework empowers the exploration of data by simultaneously varying multiple scale parameters. The framework integrates descriptor functions into the analysis process, yielding a highly expressive data summary. It seamlessly integrates established single PH summaries into multidimensional counterparts like EMP Landscapes, Silhouettes, Images, and Surfaces. These summaries represent data's multidimensional aspects as matrices and arrays, aligning effectively with diverse ML models. We provide theoretical guarantees and stability proofs for EMP summaries. We demonstrate EMP's utility in graph classification tasks, showing its effectiveness. Results reveal that EMP enhances various single PH descriptors, outperforming cutting-edge methods on multiple benchmark datasets.
- [1428] arXiv:2401.13716 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Can I trust my fake data -- A comprehensive quality assessment framework for synthetic tabular data in healthcareAuthors: Vibeke Binz Vallevik , Aleksandar Babic , Serena Elizabeth Marshall , Severin Elvatun , Helga Brøgger , Sharmini Alagaratnam , Bjørn Edwin , Narasimha Raghavan Veeraragavan , Anne Kjersti Befring , Jan Franz NygårdJournal-ref: Int. J. Med. Inform.185 (2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Ensuring safe adoption of AI tools in healthcare hinges on access to sufficient data for training, testing and validation. In response to privacy concerns and regulatory requirements, using synthetic data has been suggested. Synthetic data is created by training a generator on real data to produce a dataset with similar statistical properties. Competing metrics with differing taxonomies for quality evaluation have been suggested, resulting in a complex landscape. Optimising quality entails balancing considerations that make the data fit for use, yet relevant dimensions are left out of existing frameworks. We performed a comprehensive literature review on the use of quality evaluation metrics on SD within the scope of tabular healthcare data and SD made using deep generative methods. Based on this and the collective team experiences, we developed a conceptual framework for quality assurance. The applicability was benchmarked against a practical case from the Dutch National Cancer Registry. We present a conceptual framework for quality assurance of SD for AI applications in healthcare that aligns diverging taxonomies, expands on common quality dimensions to include the dimensions of Fairness and Carbon footprint, and proposes stages necessary to support real-life applications. Building trust in synthetic data by increasing transparency and reducing the safety risk will accelerate the development and uptake of trustworthy AI tools for the benefit of patients. Despite the growing emphasis on algorithmic fairness and carbon footprint, these metrics were scarce in the literature review. The overwhelming focus was on statistical similarity using distance metrics while sequential logic detection was scarce. A consensus-backed framework that includes all relevant quality dimensions can provide assurance for safe and responsible real-life applications of SD.
- [1429] arXiv:2401.13722 (cross-list from cs.HC) [ pdf , other ]
-
Title: Proactive Emotion Tracker: AI-Driven Continuous Mood and Emotion MonitoringAuthors: Mohammad Asif , Sudhakar Mishra , Ankush Sonker , Sanidhya Gupta , Somesh Kumar Maurya , Uma Shanker TiwarySubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: This research project aims to tackle the growing mental health challenges in today's digital age. It employs a modified pre-trained BERT model to detect depressive text within social media and users' web browsing data, achieving an impressive 93% test accuracy. Simultaneously, the project aims to incorporate physiological signals from wearable devices, such as smartwatches and EEG sensors, to provide long-term tracking and prognosis of mood disorders and emotional states. This comprehensive approach holds promise for enhancing early detection of depression and advancing overall mental health outcomes.
- [1430] arXiv:2401.13751 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: A Systematic Approach to Robustness Modelling for Deep Convolutional Neural NetworksSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Abstract: Convolutional neural networks have shown to be widely applicable to a large number of fields when large amounts of labelled data are available. The recent trend has been to use models with increasingly larger sets of tunable parameters to increase model accuracy, reduce model loss, or create more adversarially robust models -- goals that are often at odds with one another. In particular, recent theoretical work raises questions about the ability for even larger models to generalize to data outside of the controlled train and test sets. As such, we examine the role of the number of hidden layers in the ResNet model, demonstrated on the MNIST, CIFAR10, CIFAR100 datasets. We test a variety of parameters including the size of the model, the floating point precision, and the noise level of both the training data and the model output. To encapsulate the model's predictive power and computational cost, we provide a method that uses induced failures to model the probability of failure as a function of time and relate that to a novel metric that allows us to quickly determine whether or not the cost of training a model outweighs the cost of attacking it. Using this approach, we are able to approximate the expected failure rate using a small number of specially crafted samples rather than increasingly larger benchmark datasets. We demonstrate the efficacy of this technique on both the MNIST and CIFAR10 datasets using 8-, 16-, 32-, and 64-bit floating-point numbers, various data pre-processing techniques, and several attacks on five configurations of the ResNet model. Then, using empirical measurements, we examine the various trade-offs between cost, robustness, latency, and reliability to find that larger models do not significantly aid in adversarial robustness despite costing significantly more to train.
- [1431] arXiv:2401.13782 (cross-list from cs.DL) [ pdf , other ]
-
Title: Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research VisibilityComments: 10 Pages, 14 FiguresSubjects: Digital Libraries (cs.DL) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: As the number of accepted papers at AI and ML conferences reaches into the thousands, it has become unclear how researchers access and read research publications. In this paper, we investigate the role of social media influencers in enhancing the visibility of machine learning research, particularly the citation counts of papers they share. We have compiled a comprehensive dataset of over 8,000 papers, spanning tweets from December 2018 to October 2023, alongside controls precisely matched by 9 key covariates. Our statistical and causal inference analysis reveals a significant increase in citations for papers endorsed by these influencers, with median citation counts 2-3 times higher than those of the control group. Additionally, the study delves into the geographic, gender, and institutional diversity of highlighted authors. Given these findings, we advocate for a responsible approach to curation, encouraging influencers to uphold the journalistic standard that includes showcasing diverse research topics, authors, and institutions.
- [1432] arXiv:2401.13796 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer LearningComments: under revSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Machine Learning (ML) has revolutionized various domains, offering predictive capabilities in several areas. However, with the increasing accessibility of ML tools, many practitioners, lacking deep ML expertise, adopt a "push the button" approach, utilizing user-friendly interfaces without a thorough understanding of underlying algorithms. While this approach provides convenience, it raises concerns about the reliability of outcomes, leading to challenges such as incorrect performance evaluation. This paper addresses a critical issue in ML, known as data leakage, where unintended information contaminates the training data, impacting model performance evaluation. Users, due to a lack of understanding, may inadvertently overlook crucial steps, leading to optimistic performance estimates that may not hold in real-world scenarios. The discrepancy between evaluated and actual performance on new data is a significant concern. In particular, this paper categorizes data leakage in ML, discussing how certain conditions can propagate through the ML workflow. Furthermore, it explores the connection between data leakage and the specific task being addressed, investigates its occurrence in Transfer Learning, and compares standard inductive ML with transductive ML frameworks. The conclusion summarizes key findings, emphasizing the importance of addressing data leakage for robust and reliable ML applications.
- [1433] arXiv:2401.13800 (cross-list from cs.RO) [ pdf , other ]
-
Title: Multi-Object Navigation in real environments using hybrid policiesSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Navigation has been classically solved in robotics through the combination of SLAM and planning. More recently, beyond waypoint planning, problems involving significant components of (visual) high-level reasoning have been explored in simulated environments, mostly addressed with large-scale machine learning, in particular RL, offline-RL or imitation learning. These methods require the agent to learn various skills like local planning, mapping objects and querying the learned spatial representations. In contrast to simpler tasks like waypoint planning (PointGoal), for these more complex tasks the current state-of-the-art models have been thoroughly evaluated in simulation but, to our best knowledge, not yet in real environments.
In this work we focus on sim2real transfer. We target the challenging Multi-Object Navigation (Multi-ON) task and port it to a physical environment containing real replicas of the originally virtual Multi-ON objects. We introduce a hybrid navigation method, which decomposes the problem into two different skills: (1) waypoint navigation is addressed with classical SLAM combined with a symbolic planner, whereas (2) exploration, semantic mapping and goal retrieval are dealt with deep neural networks trained with a combination of supervised learning and RL. We show the advantages of this approach compared to end-to-end methods both in simulation and a real environment and outperform the SOTA for this task. - [1434] arXiv:2401.13802 (cross-list from cs.SE) [ pdf , other ]
-
Title: Investigating the Efficacy of Large Language Models for Code Clone DetectionAuthors: Mohamad Khajezade , Jie JW Wu , Fatemeh Hendijani Fard , Gema Rodríguez-Pérez , Mohamed Sami ShehataSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) have demonstrated remarkable success in various natural language processing and software engineering tasks, such as code generation. The LLMs are mainly utilized in the prompt-based zero/few-shot paradigm to guide the model in accomplishing the task. GPT-based models are one of the popular ones studied for tasks such as code comment generation or test generation. These tasks are `generative' tasks. However, there is limited research on the usage of LLMs for `non-generative' tasks such as classification using the prompt-based paradigm. In this preliminary exploratory study, we investigated the applicability of LLMs for Code Clone Detection (CCD), a non-generative task. By building a mono-lingual and cross-lingual CCD dataset derived from CodeNet, we first investigated two different prompts using ChatGPT to detect Type-4 code clones in Java-Java and Java-Ruby pairs in a zero-shot setting. We then conducted an analysis to understand the strengths and weaknesses of ChatGPT in CCD. ChatGPT surpasses the baselines in cross-language CCD attaining an F1-score of 0.877 and achieves comparable performance to fully fine-tuned models for mono-lingual CCD, with an F1-score of 0.878. Also, the prompt and the difficulty level of the problems has an impact on the performance of ChatGPT. Finally we provide insights and future directions based on our initial analysis
- [1435] arXiv:2401.13822 (cross-list from cs.LG) [ pdf , other ]
-
Title: Navigating Dataset Documentations in AI: A Large-Scale Analysis of Dataset Cards on Hugging FaceComments: Accepted to the main conference of ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Advances in machine learning are closely tied to the creation of datasets. While data documentation is widely recognized as essential to the reliability, reproducibility, and transparency of ML, we lack a systematic empirical understanding of current dataset documentation practices. To shed light on this question, here we take Hugging Face -- one of the largest platforms for sharing and collaborating on ML models and datasets -- as a prominent case study. By analyzing all 7,433 dataset documentation on Hugging Face, our investigation provides an overview of the Hugging Face dataset ecosystem and insights into dataset documentation practices, yielding 5 main findings: (1) The dataset card completion rate shows marked heterogeneity correlated with dataset popularity. (2) A granular examination of each section within the dataset card reveals that the practitioners seem to prioritize Dataset Description and Dataset Structure sections, while the Considerations for Using the Data section receives the lowest proportion of content. (3) By analyzing the subsections within each section and utilizing topic modeling to identify key topics, we uncover what is discussed in each section, and underscore significant themes encompassing both technical and social impacts, as well as limitations within the Considerations for Using the Data section. (4) Our findings also highlight the need for improved accessibility and reproducibility of datasets in the Usage sections. (5) In addition, our human annotation evaluation emphasizes the pivotal role of comprehensive dataset content in shaping individuals' perceptions of a dataset card's overall quality. Overall, our study offers a unique perspective on analyzing dataset documentation through large-scale data science analysis and underlines the need for more thorough dataset documentation in machine learning research.
- [1436] arXiv:2401.13827 (cross-list from cs.LG) [ pdf , other ]
-
Title: Traffic Learning and Proactive UAV Trajectory Planning for Data Uplink in Markovian IoT ModelsJournal-ref: IEEE Internet of Things JournalSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Networking and Internet Architecture (cs.NI)
Abstract: The age of information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on a message exchange between the devices and the base station (BS) before communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving, and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of IoT devices based on Markovian events. The learning proceeds to optimize the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: the forward algorithm (FA) and the long short-term memory (LSTM). Afterward, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.
- [1437] arXiv:2401.13835 (cross-list from cs.LG) [ pdf , other ]
-
Title: The Calibration Gap between Model and Human Confidence in Large Language ModelsAuthors: Mark Steyvers , Heliodoro Tejeda , Aakriti Kumar , Catarina Belem , Sheer Karny , Xinyue Hu , Lukas Mayer , Padhraic SmythComments: 27 pages, 10 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Abstract: For large language models (LLMs) to be trusted by humans they need to be well-calibrated in the sense that they can accurately assess and communicate how likely it is that their predictions are correct. Recent work has focused on the quality of internal LLM confidence assessments, but the question remains of how well LLMs can communicate this internal model confidence to human users. This paper explores the disparity between external human confidence in an LLM's responses and the internal confidence of the model. Through experiments involving multiple-choice questions, we systematically examine human users' ability to discern the reliability of LLM outputs. Our study focuses on two key areas: (1) assessing users' perception of true LLM confidence and (2) investigating the impact of tailored explanations on this perception. The research highlights that default explanations from LLMs often lead to user overestimation of both the model's confidence and its' accuracy. By modifying the explanations to more accurately reflect the LLM's internal confidence, we observe a significant shift in user perception, aligning it more closely with the model's actual confidence levels. This adjustment in explanatory approach demonstrates potential for enhancing user trust and accuracy in assessing LLM outputs. The findings underscore the importance of transparent communication of confidence levels in LLMs, particularly in high-stakes applications where understanding the reliability of AI-generated information is essential.
- [1438] arXiv:2401.13848 (cross-list from cs.LG) [ pdf , other ]
-
Title: A V2X-based Privacy Preserving Federated Measuring and Learning SystemComments: 8 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Abstract: Future autonomous vehicles (AVs) will use a variety of sensors that generate a vast amount of data. Naturally, this data not only serves self-driving algorithms; but can also assist other vehicles or the infrastructure in real-time decision-making. Consequently, vehicles shall exchange their measurement data over Vehicle-to-Everything (V2X) technologies. Moreover, predicting the state of the road network might be beneficial too. With such a prediction, we might mitigate road congestion, balance parking lot usage, or optimize the traffic flow. That would decrease transportation costs as well as reduce its environmental impact.
In this paper, we propose a federated measurement and learning system that provides real-time data to fellow vehicles over Vehicle-to-Vehicle (V2V) communication while also operating a federated learning (FL) scheme over the Vehicle-to-Network (V2N) link to create a predictive model of the transportation network. As we are yet to have real-world AV data, we model it with a non-IID (independent and identically distributed) dataset to evaluate the capabilities of the proposed system in terms of performance and privacy. Results indicate that the proposed FL scheme improves learning performance and prevents eavesdropping at the aggregator server side. - [1439] arXiv:2401.13849 (cross-list from cs.CL) [ pdf , other ]
-
Title: TPD: Enhancing Student Language Model Reasoning via Principle Discovery and GuidanceAuthors: Haorui Wang (1), Rongzhi Zhang (1), Yinghao Li (1), Lingkai Kong (1), Yuchen Zhuang (1), Xiusi Chen (2), Chao Zhang (1) ((1) College of Computing, Georgia Institute of Technology, (2) Department of Computer Science, University of California, Los Angeles)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have recently showcased remarkable reasoning abilities. However, larger models often surpass their smaller counterparts in reasoning tasks, posing the challenge of effectively transferring these capabilities from larger models. Existing approaches heavily rely on extensive fine-tuning data or continuous interactions with a superior teacher LLM during inference. We introduce a principle-based teacher-student framework called ``Teaching via Principle Discovery'' (TPD) to address these limitations. Inspired by human learning mechanisms, TPD mimics the interaction between a teacher and a student using a principle-based approach. The teacher LLM generates problem-solving instructions and corrective principles based on the student LLM's errors. These principles guide the refinement of instructions and the selection of instructive examples from a validation set. This enables the student model to learn from both the teacher's guidance and its own mistakes. Once the student model begins making inferences, TPD requires no further intervention from the teacher LLM or humans. Through extensive experiments across eight reasoning tasks, we demonstrate the effectiveness of TPD. Compared to standard chain-of-thought prompting, TPD significantly improves the student model's performance, achieving $6.2\%$ improvement on average.
- [1440] arXiv:2401.13904 (cross-list from cs.LG) [ pdf , other ]
-
Title: Empowering Machines to Think Like Chemists: Unveiling Molecular Structure-Polarity Relationships with Hierarchical Symbolic RegressionComments: 33 pages, 6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Databases (cs.DB); Applications (stat.AP)
Abstract: Thin-layer chromatography (TLC) is a crucial technique in molecular polarity analysis. Despite its importance, the interpretability of predictive models for TLC, especially those driven by artificial intelligence, remains a challenge. Current approaches, utilizing either high-dimensional molecular fingerprints or domain-knowledge-driven feature engineering, often face a dilemma between expressiveness and interpretability. To bridge this gap, we introduce Unsupervised Hierarchical Symbolic Regression (UHiSR), combining hierarchical neural networks and symbolic regression. UHiSR automatically distills chemical-intuitive polarity indices, and discovers interpretable equations that link molecular structure to chromatographic behavior.
- [1441] arXiv:2401.13913 (cross-list from cs.LG) [ pdf , other ]
-
Title: Spectral Clustering for Discrete DistributionsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Discrete distribution clustering (D2C) was often solved by Wasserstein barycenter methods. These methods are under a common assumption that clusters can be well represented by barycenters, which may not hold in many real applications. In this work, we propose a simple yet effective framework based on spectral clustering and distribution affinity measures (e.g., maximum mean discrepancy and Wasserstein distance) for D2C. To improve the scalability, we propose to use linear optimal transport to construct affinity matrices efficiently on large datasets. We provide theoretical guarantees for the success of the proposed methods in clustering distributions. Experiments on synthetic and real data show that our methods outperform the baselines largely in terms of both clustering accuracy and computational efficiency.
- [1442] arXiv:2401.13919 (cross-list from cs.CL) [ pdf , other ]
-
Title: WebVoyager: Building an End-to-End Web Agent with Large Multimodal ModelsAuthors: Hongliang He , Wenlin Yao , Kaixin Ma , Wenhao Yu , Yong Dai , Hongming Zhang , Zhenzhong Lan , Dong YuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The rapid advancement of large language models (LLMs) has led to a new era marked by the development of autonomous applications in real-world scenarios, which drives innovation in creating advanced web agents. Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots, greatly limiting their applicability in real-world scenarios. To bridge this gap, we introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. Moreover, we establish a new benchmark by compiling real-world tasks from 15 popular websites and introduce an automatic evaluation protocol leveraging multimodal understanding abilities of GPT-4V to evaluate open-ended web agents. We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups, underscoring the exceptional capability of WebVoyager. The proposed automatic evaluation metric achieves 85.3% agreement with human judgment, indicating its effectiveness in providing reliable and accurate assessments of web agents.
- [1443] arXiv:2401.13920 (cross-list from cs.LG) [ pdf , other ]
-
Title: LocMoE: A Low-overhead MoE for Large Language Model TrainingAuthors: Jing Li , Zhijie Sun , Xuan He , Li Zeng , Yi Lin , Entong Li , Binfan Zheng , Rongqian Zhao , Xin ChenSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-To-All communication, along with relatively redundant computation owing to large expert capacity. Load imbalance may result from existing routing policies that consistently tend to select certain experts. The frequent inter-node communication in the All-To-All procedure also significantly prolongs the training time. To alleviate the above performance problems, we propose a novel routing strategy that combines load balance and locality by converting partial inter-node communication to that of intra-node. Notably, we elucidate that there is a minimum threshold for expert capacity, calculated through the maximal angular deviation between the gating weights of the experts and the assigned tokens. We port these modifications on the PanGu-Sigma model based on the MindSpore framework with multi-level routing and conduct experiments on Ascend clusters. The experiment results demonstrate that the proposed LocMoE reduces training time per epoch by 12.68% to 22.24% compared to classical routers, such as hash router and switch router, without impacting the model accuracy.
- [1444] arXiv:2401.13945 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: General Automatic Solution Generation of Social ProblemsJournal-ref: Machine Intelligence Research 2024Subjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Multiagent Systems (cs.MA)
Abstract: Given the escalating intricacy and multifaceted nature of contemporary social systems, manually generating solutions to address pertinent social issues has become a formidable task. In response to this challenge, the rapid development of artificial intelligence has spurred the exploration of computational methodologies aimed at automatically generating solutions. However, current methods for auto-generation of solutions mainly concentrate on local social regulations that pertain to specific scenarios. Here, we report an automatic social operating system (ASOS) designed for general social solution generation, which is built upon agent-based models, enabling both global and local analyses and regulations of social problems across spatial and temporal dimensions. ASOS adopts a hypergraph with extensible social semantics for a comprehensive and structured representation of social dynamics. It also incorporates a generalized protocol for standardized hypergraph operations and a symbolic hybrid framework that delivers interpretable solutions, yielding a balance between regulatory efficacy and function viability. To demonstrate the effectiveness of ASOS, we apply it to the domain of averting extreme events within international oil futures markets. By generating a new trading role supplemented by new mechanisms, ASOS can adeptly discern precarious market conditions and make front-running interventions for non-profit purposes. This study demonstrates that ASOS provides an efficient and systematic approach for generating solutions for enhancing our society.
- [1445] arXiv:2401.13968 (cross-list from cs.LG) [ pdf , other ]
-
Title: Dynamic Long-Term Time-Series Forecasting via Meta Transformer NetworksAuthors: Muhammad Anwar Ma'sum , MD Rasel Sarkar , Mahardhika Pratama , Savitha Ramasamy , Sreenatha Anavatti , Lin Liu , Habibullah , Ryszard KowalczykComments: Under Consideration in IEEE Transactions on Artificial IntelligenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: A reliable long-term time-series forecaster is highly demanded in practice but comes across many challenges such as low computational and memory footprints as well as robustness against dynamic learning environments. This paper proposes Meta-Transformer Networks (MANTRA) to deal with the dynamic long-term time-series forecasting tasks. MANTRA relies on the concept of fast and slow learners where a collection of fast learners learns different aspects of data distributions while adapting quickly to changes. A slow learner tailors suitable representations to fast learners. Fast adaptations to dynamic environments are achieved using the universal representation transformer layers producing task-adapted representations with a small number of parameters. Our experiments using four datasets with different prediction lengths demonstrate the advantage of our approach with at least $3\%$ improvements over the baseline algorithms for both multivariate and univariate settings. Source codes of MANTRA are publicly available in \url{ this https URL }.
- [1446] arXiv:2401.13974 (cross-list from cs.CV) [ pdf , other ]
-
Title: BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: Recent text-to-image generation models have demonstrated incredible success in generating images that faithfully follow input prompts. However, the requirement of using words to describe a desired concept provides limited control over the appearance of the generated concepts. In this work, we address this shortcoming by proposing an approach to enable personalization capabilities in existing text-to-image diffusion models. We propose a novel architecture (BootPIG) that allows a user to provide reference images of an object in order to guide the appearance of a concept in the generated images.
The proposed BootPIG architecture makes minimal modifications to a pretrained text-to-image diffusion model and utilizes a separate UNet model to steer the generations toward the desired appearance. We introduce a training procedure that allows us to bootstrap personalization capabilities in the BootPIG architecture using data generated from pretrained text-to-image models, LLM chat agents, and image segmentation models. In contrast to existing methods that require several days of pretraining, the BootPIG architecture can be trained in approximately 1 hour. Experiments on the DreamBooth dataset demonstrate that BootPIG outperforms existing zero-shot methods while being comparable with test-time finetuning approaches. Through a user study, we validate the preference for BootPIG generations over existing methods both in maintaining fidelity to the reference object's appearance and aligning with textual prompts. - [1447] arXiv:2401.13976 (cross-list from cs.CV) [ pdf , other ]
-
Title: Learning to Manipulate Artistic ImagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recent advancement in computer vision has significantly lowered the barriers to artistic creation. Exemplar-based image translation methods have attracted much attention due to flexibility and controllability. However, these methods hold assumptions regarding semantics or require semantic information as the input, while accurate semantics is not easy to obtain in artistic images. Besides, these methods suffer from cross-domain artifacts due to training data prior and generate imprecise structure due to feature compression in the spatial domain. In this paper, we propose an arbitrary Style Image Manipulation Network (SIM-Net), which leverages semantic-free information as guidance and a region transportation strategy in a self-supervised manner for image generation. Our method balances computational efficiency and high resolution to a certain extent. Moreover, our method facilitates zero-shot style image manipulation. Both qualitative and quantitative experiments demonstrate the superiority of our method over state-of-the-art methods.Code is available at this https URL .
- [1448] arXiv:2401.13979 (cross-list from cs.CL) [ pdf , other ]
-
Title: Leeroo Orchestrator: Elevating LLMs Performance Through Model IntegrationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this paper, we propose an architecture to harness the collective knowledge of multiple trained LLMs to create a new state-of-the-art. At the core of this framework is a LLM-based orchestrator that is adept at picking the right underlying LLM experts for optimal task execution. Inspired by self-play in reinforcement learning, we created a loop of query generation, orchestration, and evaluation to generate training data for the orchestrator. Our evaluation focused on the MMLU benchmark, employing models with 7B, 13B, and 34B parameters available on Hugging Face. The results demonstrate new state-of-the-art open-source models: Our Leeroo orchestrator achieves performance on par with the Mixtral model while incurring only two-thirds of its cost. Moreover, increasing the allowed cost surpasses Mixtral's accuracy by over 5% at the same cost level, reaching an accuracy of 75.9%. Further enhancements were observed when integrating GPT4 into the underlying model pool. The Leeroo orchestrator nearly matches GPT4's performance at half the cost and even exceeds GPT4's results with a 25% cost reduction. These findings illustrate the potential of our architecture in creating state-of-the-art and cost-effective LLMs by optimizing the synergy between multiple LLMs to achieve superior performance outcomes.
- [1449] arXiv:2401.13986 (cross-list from cs.CL) [ pdf , other ]
-
Title: Towards Consistent Natural-Language Explanations via Explanation-Consistency FinetuningComments: arXiv admin note: text overlap with arXiv:2307.08678Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at this https URL .
- [1450] arXiv:2401.13987 (cross-list from cs.LG) [ pdf , other ]
-
Title: Cross-Domain Few-Shot Learning via Adaptive Transformer NetworksAuthors: Naeem Paeedeh , Mahardhika Pratama , Muhammad Anwar Ma'sum , Wolfgang Mayer , Zehong Cao , Ryszard KowlczykComments: Under Consideration in Knowledge-based SystemsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Most few-shot learning works rely on the same domain assumption between the base and the target tasks, hindering their practical applications. This paper proposes an adaptive transformer network (ADAPTER), a simple but effective solution for cross-domain few-shot learning where there exist large domain shifts between the base task and the target task. ADAPTER is built upon the idea of bidirectional cross-attention to learn transferable features between the two domains. The proposed architecture is trained with DINO to produce diverse, and less biased features to avoid the supervision collapse problem. Furthermore, the label smoothing approach is proposed to improve the consistency and reliability of the predictions by also considering the predicted labels of the close samples in the embedding space. The performance of ADAPTER is rigorously evaluated in the BSCD-FSL benchmarks in which it outperforms prior arts with significant margins.
- [1451] arXiv:2401.13996 (cross-list from cs.CL) [ pdf , other ]
-
Title: Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-EvolutionAuthors: Cheng Qian , Shihao Liang , Yujia Qin , Yining Ye , Xin Cong , Yankai Lin , Yesai Wu , Zhiyuan Liu , Maosong SunComments: 18 pages, 5 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces Investigate-Consolidate-Exploit (ICE), a novel strategy for enhancing the adaptability and flexibility of AI agents through inter-task self-evolution. Unlike existing methods focused on intra-task learning, ICE promotes the transfer of knowledge between tasks for genuine self-evolution, similar to human experience learning. The strategy dynamically investigates planning and execution trajectories, consolidates them into simplified workflows and pipelines, and exploits them for improved task execution. Our experiments on the XAgent framework demonstrate ICE's effectiveness, reducing API calls by as much as 80% and significantly decreasing the demand for the model's capability. Specifically, when combined with GPT-3.5, ICE's performance matches that of raw GPT-4 across various agent tasks. We argue that this self-evolution approach represents a paradigm shift in agent design, contributing to a more robust AI community and ecosystem, and moving a step closer to full autonomy.
- [1452] arXiv:2401.14003 (cross-list from cs.CL) [ pdf , other ]
-
Title: ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge BasesComments: Proceedings of EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Reasoning over Commonsense Knowledge Bases (CSKB), i.e. CSKB reasoning, has been explored as a way to acquire new commonsense knowledge based on reference knowledge in the original CSKBs and external prior knowledge. Despite the advancement of Large Language Models (LLM) and prompt engineering techniques in various reasoning tasks, they still struggle to deal with CSKB reasoning. One of the problems is that it is hard for them to acquire explicit relational constraints in CSKBs from only in-context exemplars, due to a lack of symbolic reasoning capabilities (Bengio et al., 2021). To this end, we proposed **ConstraintChecker**, a plugin over prompting techniques to provide and check explicit constraints. When considering a new knowledge instance, ConstraintChecker employs a rule-based module to produce a list of constraints, then it uses a zero-shot learning module to check whether this knowledge instance satisfies all constraints. The acquired constraint-checking result is then aggregated with the output of the main prompting technique to produce the final output. Experimental results on CSKB Reasoning benchmarks demonstrate the effectiveness of our method by bringing consistent improvements over all prompting methods. Codes and data are available at \url{ this https URL }.
- [1453] arXiv:2401.14011 (cross-list from cs.CL) [ pdf , other ]
-
Title: CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningAuthors: Zheqi He , Xinya Wu , Pengfei Zhou , Richeng Xuan , Guang Liu , Xi Yang , Qiannan Zhu , Hua HuangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Multimedia (cs.MM)
Abstract: Multi-modal large language models(MLLMs) have achieved remarkable progress and demonstrated powerful knowledge comprehension and reasoning abilities. However, the mastery of domain-specific knowledge, which is essential for evaluating the intelligence of MLLMs, continues to be a challenge. Current multi-modal benchmarks for domain-specific knowledge concentrate on multiple-choice questions and are predominantly available in English, which imposes limitations on the comprehensiveness of the evaluation. To this end, we introduce CMMU, a novel benchmark for multi-modal and multi-type question understanding and reasoning in Chinese. CMMU consists of 3,603 questions in 7 subjects, covering knowledge from primary to high school. The questions can be categorized into 3 types: multiple-choice, multiple-response, and fill-in-the-blank, bringing greater challenges to MLLMs. In addition, we propose a rigorous evaluation strategy called ShiftCheck for assessing multiple-choice questions. The strategy aims to reduce position bias, minimize the influence of randomness on correctness, and perform a quantitative analysis of position bias. We evaluate seven open-source MLLMs along with GPT4-V, Gemini-Pro, and Qwen-VL-Plus. The results demonstrate that CMMU poses a significant challenge to the recent MLLMs.
- [1454] arXiv:2401.14019 (cross-list from cs.CL) [ pdf , other ]
-
Title: Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AIAuthors: Elron Bandel , Yotam Perlitz , Elad Venezian , Roni Friedman-Melamed , Ofir Arviv , Matan Orbach , Shachar Don-Yehyia , Dafna Sheinwald , Ariel Gera , Leshem Choshen , Michal Shmueli-Scheuer , Yoav KatzComments: Submitted to NAACL demo trackSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In the dynamic landscape of generative NLP, traditional text processing pipelines limit research flexibility and reproducibility, as they are tailored to specific dataset, task, and model combinations. The escalating complexity, involving system prompts, model-specific formats, instructions, and more, calls for a shift to a structured, modular, and customizable solution. Addressing this need, we present Unitxt, an innovative library for customizable textual data preparation and evaluation tailored to generative language models. Unitxt natively integrates with common libraries like HuggingFace and LM-eval-harness and deconstructs processing flows into modular components, enabling easy customization and sharing between practitioners. These components encompass model-specific formats, task prompts, and many other comprehensive dataset processing definitions. The Unitxt-Catalog centralizes these components, fostering collaboration and exploration in modern textual data workflows. Beyond being a tool, Unitxt is a community-driven platform, empowering users to build, share, and advance their pipelines collaboratively. Join the Unitxt community at this https URL
- [1455] arXiv:2401.14032 (cross-list from cs.CV) [ pdf , other ]
-
Title: GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian SplattingComments: IJCAI2024 submit, 8 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km$^2$. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information
- [1456] arXiv:2401.14043 (cross-list from cs.CL) [ pdf , other ]
-
Title: Towards Goal-oriented Large Language Model Prompting: A SurveySubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have shown prominent performance in various downstream tasks in which prompt engineering plays a pivotal role in optimizing LLMs' performance. This paper, not as an overview of current prompt engineering methods, aims to highlight the limitation of designing prompts while holding an anthropomorphic assumption that expects LLMs to think like humans. From our review of 35 representative studies, we demonstrate that a goal-oriented prompt formulation, which guides LLMs to follow established human logical thinking, significantly improves the performance of LLMs. Furthermore, We introduce a novel taxonomy that categorizes goal-oriented prompting methods into five interconnected stages and we demonstrate the broad applicability of our framework by summarizing ten applicable tasks. With four future directions proposed, we hope to further emphasize and promote goal-oriented prompt engineering.
- [1457] arXiv:2401.14057 (cross-list from cs.RO) [ pdf , ps , other ]
-
Title: Left/Right Brain, human motor control and the implications for roboticsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Neurons and Cognition (q-bio.NC)
Abstract: Neural Network movement controllers promise a variety of advantages over conventional control methods however they are not widely adopted due to their inability to produce reliably precise movements. This research explores a bilateral neural network architecture as a control system for motor tasks. We aimed to achieve hemispheric specialisation similar to what is observed in humans across different tasks; the dominant system (usually the right hand, left hemisphere) excels at tasks involving coordination and efficiency of movement, and the non-dominant system performs better at tasks requiring positional stability. Specialisation was achieved by training the hemispheres with different loss functions tailored toward the expected behaviour of the respective hemispheres. We compared bilateral models with and without specialised hemispheres, with and without inter-hemispheric connectivity (representing the biological Corpus Callosum), and unilateral models with and without specialisation. The models were trained and tested on two tasks common in the human motor control literature: the random reach task, suited to the dominant system, a model with better coordination, and the hold position task, suited to the non-dominant system, a model with more stable movement. Each system out-performed the non-favoured system in its preferred task. For both tasks, a bilateral model outperforms the 'non-preferred' hand, and is as good or better than the 'preferred' hand. The Corpus Callosum tends to improve performance, but not always for the specialised models.
- [1458] arXiv:2401.14066 (cross-list from cs.CV) [ pdf , other ]
-
Title: CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal DiffusionAuthors: Nisha Huang , Weiming Dong , Yuxin Zhang , Fan Tang , Ronghui Li , Chongyang Ma , Xiu Li , Changsheng XuSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Large-scale text-to-image generative models have made impressive strides, showcasing their ability to synthesize a vast array of high-quality images. However, adapting these models for artistic image editing presents two significant challenges. Firstly, users struggle to craft textual prompts that meticulously detail visual elements of the input image. Secondly, prevalent models, when effecting modifications in specific zones, frequently disrupt the overall artistic style, complicating the attainment of cohesive and aesthetically unified artworks. To surmount these obstacles, we build the innovative unified framework CreativeSynth, which is based on a diffusion model with the ability to coordinate multimodal inputs and multitask in the field of artistic image generation. By integrating multimodal features with customized attention mechanisms, CreativeSynth facilitates the importation of real-world semantic content into the domain of art through inversion and real-time style transfer. This allows for the precise manipulation of image style and content while maintaining the integrity of the original model parameters. Rigorous qualitative and quantitative evaluations underscore that CreativeSynth excels in enhancing artistic images' fidelity and preserves their innate aesthetic essence. By bridging the gap between generative models and artistic finesse, CreativeSynth becomes a custom digital palette.
- [1459] arXiv:2401.14067 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Ta'keed: The First Generative Fact-Checking System for Arabic ClaimsComments: 9 pages, conference paperJournal-ref: VOLUME 14 NUMBER 01 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces Ta'keed, an explainable Arabic automatic fact-checking system. While existing research often focuses on classifying claims as "True" or "False," there is a limited exploration of generating explanations for claim credibility, particularly in Arabic. Ta'keed addresses this gap by assessing claim truthfulness based on retrieved snippets, utilizing two main components: information retrieval and LLM-based claim verification. We compiled the ArFactEx, a testing gold-labelled dataset with manually justified references, to evaluate the system. The initial model achieved a promising F1 score of 0.72 in the classification task. Meanwhile, the system's generated explanations are compared with gold-standard explanations syntactically and semantically. The study recommends evaluating using semantic similarities, resulting in an average cosine similarity score of 0.76. Additionally, we explored the impact of varying snippet quantities on claim classification accuracy, revealing a potential correlation, with the model using the top seven hits outperforming others with an F1 score of 0.77.
- [1460] arXiv:2401.14079 (cross-list from cs.SE) [ pdf , other ]
-
Title: From Requirements to Architecture: An AI-Based Journey to Semi-Automatically Generate Software ArchitecturesComments: 4 pages, vision paper, submitted to the ICSE workshop Designing2024Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Designing domain models and software architectures represents a significant challenge in software development, as the resulting architectures play a vital role in fulfilling the system's quality of service. Due to time pressure, architects often model only one architecture based on their known limited domain understanding, patterns, and experience instead of thoroughly analyzing the domain and evaluating multiple candidates, selecting the best fitting. Existing approaches try to generate domain models based on requirements, but still require time-consuming manual effort to achieve good results. Therefore, in this vision paper, we propose a method to generate software architecture candidates semi-automatically based on requirements using artificial intelligence techniques. We further envision an automatic evaluation and trade-off analysis of the generated architecture candidates using, e.g., the architecture trade-off analysis method combined with large language models and quantitative analyses. To evaluate this approach, we aim to analyze the quality of the generated architecture models and the efficiency and effectiveness of our proposed process by conducting qualitative studies.
- [1461] arXiv:2401.14109 (cross-list from cs.CL) [ pdf , other ]
-
Title: CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor NetworksAuthors: Andrei Tomut , Saeed S. Jahromi , Sukhbinder Singh , Faysal Ishtiaq , Cesar Muñoz , Prabdeep Singh Bajaj , Ali Elborady , Gianni del Bimbo , Mehrazin Alizadeh , David Montero , Pablo Martin-Ramiro , Muhammad Ibrahim , Oussama Tahiri Alaoui , John Malcolm , Samuel Mugel , Roman OrusComments: 4 pages, 3 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantum Physics (quant-ph)
Abstract: Large Language Models (LLMs) such as ChatGPT and LlaMA are advancing rapidly in generative Artificial Intelligence (AI), but their immense size poses significant challenges, such as huge training and inference costs, substantial energy demands, and limitations for on-site deployment. Traditional compression methods such as pruning, distillation, and low-rank approximation focus on reducing the effective number of neurons in the network, while quantization focuses on reducing the numerical precision of individual weights to reduce the model size while keeping the number of neurons fixed. While these compression methods have been relatively successful in practice, there's no compelling reason to believe that truncating the number of neurons is an optimal strategy. In this context, this paper introduces CompactifAI, an innovative LLM compression approach using quantum-inspired Tensor Networks that focuses on the model's correlation space instead, allowing for a more controlled, refined and interpretable model compression. Our method is versatile and can be implemented with - or on top of - other compression techniques. As a benchmark, we demonstrate that CompactifAI alone enables compression of the LlaMA-2 7B model to only $30\%$ of its original size while recovering over $90\%$ of the original accuracy after a brief distributed retraining.
- [1462] arXiv:2401.14110 (cross-list from cs.LG) [ pdf , other ]
-
Title: Towards Cheaper Inference in Deep Networks with Lower Bit-Width AccumulatorsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e.g., weights, activations, and gradients). However, current hardware still relies on high-accuracy core operations. Most significant is the operation of accumulating products. This high-precision accumulation operation is gradually becoming the main computational bottleneck. This is because, so far, the usage of low-precision accumulators led to a significant degradation in performance. In this work, we present a simple method to train and fine-tune high-end DNNs, to allow, for the first time, utilization of cheaper, $12$-bits accumulators, with no significant degradation in accuracy. Lastly, we show that as we decrease the accumulation precision further, using fine-grained gradient approximations can improve the DNN accuracy.
- [1463] arXiv:2401.14112 (cross-list from cs.LG) [ pdf , other ]
-
Title: FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-DesignAuthors: Haojun Xia , Zhen Zheng , Xiaoxia Wu , Shiyang Chen , Zhewei Yao , Stephen Youn , Arash Bakhtiari , Michael Wyatt , Donglin Zhuang , Zhongzhu Zhou , Olatunji Ruwase , Yuxiong He , Shuaiwen Leon SongComments: Adding URL link of the source codeSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR)
Abstract: Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications. However, existing systems do not provide Tensor Core support for FP6 quantization and struggle to achieve practical performance improvements during LLM inference. It is challenging to support FP6 quantization on GPUs due to (1) unfriendly memory access of model weights with irregular bit-width and (2) high runtime overhead of weight de-quantization. To address these problems, we propose TC-FPx, the first full-stack GPU kernel design scheme with unified Tensor Core support of float-point weights for various quantization bit-width. We integrate TC-FPx kernel into an existing inference system, providing new end-to-end support (called FP6-LLM) for quantized LLM inference, where better trade-offs between inference cost and model quality are achieved. Experiments show that FP6-LLM enables the inference of LLaMA-70b using only a single GPU, achieving 1.69x-2.65x higher normalized inference throughput than the FP16 baseline. The source code is publicly available at this https URL .
- [1464] arXiv:2401.14142 (cross-list from cs.CV) [ pdf , other ]
-
Title: Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic InterpretationsComments: Accepted by ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract: Existing methods, such as concept bottleneck models (CBMs), have been successful in providing concept-based interpretations for black-box deep learning models. They typically work by predicting concepts given the input and then predicting the final class label given the predicted concepts. However, (1) they often fail to capture the high-order, nonlinear interaction between concepts, e.g., correcting a predicted concept (e.g., "yellow breast") does not help correct highly correlated concepts (e.g., "yellow belly"), leading to suboptimal final accuracy; (2) they cannot naturally quantify the complex conditional dependencies between different concepts and class labels (e.g., for an image with the class label "Kentucky Warbler" and a concept "black bill", what is the probability that the model correctly predicts another concept "black crown"), therefore failing to provide deeper insight into how a black-box model works. In response to these limitations, we propose Energy-based Concept Bottleneck Models (ECBMs). Our ECBMs use a set of neural networks to define the joint energy of candidate (input, concept, class) tuples. With such a unified interface, prediction, concept correction, and conditional dependency quantification are then represented as conditional probabilities, which are generated by composing different energy functions. Our ECBMs address both limitations of existing CBMs, providing higher accuracy and richer concept interpretations. Empirical results show that our approach outperforms the state-of-the-art on real-world datasets.
- [1465] arXiv:2401.14151 (cross-list from cs.LG) [ pdf , other ]
-
Title: True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement LearningComments: Accepted by ICLR2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments. On the contrary, reinforcement learning (RL) agents learn policies from scratch, which makes them always align with environments but difficult to incorporate prior knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a novel general online framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL without requiring any prepared datasets or prior knowledge of the environments. Firstly, we query the joint probabilities of each valid action with LLMs to form behavior policies. Then, to enhance the stability and robustness of the policies, we propose two normalization methods and summarize four prompt design principles. Finally, we design a novel parameter-efficient training architecture where the actor and critic share one frozen LLM equipped with low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments to evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency and performance compared to the conventional RL method, PPO, and prompt tuning method, SayCan, in both classical decision-making environment, Overcooked, and simulated household environment, VirtualHome. ii) Benefiting from LLMs' open-vocabulary feature, TWOSOME shows superior generalization ability to unseen tasks. iii) Under our framework, there is no significant loss of the LLMs' original ability during online PPO finetuning.
- [1466] arXiv:2401.14155 (cross-list from cs.LG) [ pdf , other ]
-
Title: Alleviating Structural Distribution Shift in Graph Anomaly DetectionComments: Accepted to WSDM 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph anomaly detection (GAD) is a challenging binary classification problem due to its different structural distribution between anomalies and normal nodes -- abnormal nodes are a minority, therefore holding high heterophily and low homophily compared to normal nodes. Furthermore, due to various time factors and the annotation preferences of human experts, the heterophily and homophily can change across training and testing data, which is called structural distribution shift (SDS) in this paper. The mainstream methods are built on graph neural networks (GNNs), benefiting the classification of normals from aggregating homophilous neighbors, yet ignoring the SDS issue for anomalies and suffering from poor generalization.
This work solves the problem from a feature view. We observe that the degree of SDS varies between anomalies and normal nodes. Hence to address the issue, the key lies in resisting high heterophily for anomalies meanwhile benefiting the learning of normals from homophily. We tease out the anomaly features on which we constrain to mitigate the effect of heterophilous neighbors and make them invariant. We term our proposed framework as Graph Decomposition Network (GDN). Extensive experiments are conducted on two benchmark datasets, and the proposed framework achieves a remarkable performance boost in GAD, especially in an SDS environment where anomalies have largely different structural distribution across training and testing environments. Codes are open-sourced in this https URL . - [1467] arXiv:2401.14166 (cross-list from cs.CL) [ pdf , other ]
-
Title: BayesPrompt: Prompting Large-Scale Pre-Trained Language Models on Few-shot Inference via Debiased Domain AbstractionAuthors: Jiangmeng Li , Fei Song , Yifan Jin , Wenwen Qiang , Changwen Zheng , Fuchun Sun , Hui XiongComments: Accepted by ICLR2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: As a novel and effective fine-tuning paradigm based on large-scale pre-trained language models (PLMs), prompt-tuning aims to reduce the gap between downstream tasks and pre-training objectives. While prompt-tuning has yielded continuous advancements in various tasks, such an approach still remains a persistent defect: prompt-tuning methods fail to generalize to specific few-shot patterns. From the perspective of distribution analyses, we disclose that the intrinsic issues behind the phenomenon are the over-multitudinous conceptual knowledge contained in PLMs and the abridged knowledge for target downstream domains, which jointly result in that PLMs mis-locate the knowledge distributions corresponding to the target domains in the universal knowledge embedding space. To this end, we intuitively explore to approximate the unabridged target domains of downstream tasks in a debiased manner, and then abstract such domains to generate discriminative prompts, thereby providing the de-ambiguous guidance for PLMs. Guided by such an intuition, we propose a simple yet effective approach, namely BayesPrompt, to learn prompts that contain the domain discriminative information against the interference from domain-irrelevant knowledge. BayesPrompt primitively leverages known distributions to approximate the debiased factual distributions of target domains and further uniformly samples certain representative features from the approximated distributions to generate the ultimate prompts for PLMs. We provide theoretical insights with the connection to domain adaptation. Empirically, our method achieves state-of-the-art performance on benchmarks.
- [1468] arXiv:2401.14174 (cross-list from cs.CC) [ pdf , other ]
-
Title: The Boundaries of Tractability in Hierarchical Task Network PlanningSubjects: Computational Complexity (cs.CC) ; Artificial Intelligence (cs.AI)
Abstract: We study the complexity-theoretic boundaries of tractability for three classical problems in the context of Hierarchical Task Network Planning: the validation of a provided plan, whether an executable plan exists, and whether a given state can be reached by some plan. We show that all three problems can be solved in polynomial time on primitive task networks of constant partial order width (and a generalization thereof), whereas for the latter two problems this holds only under a provably necessary restriction to the state space. Next, we obtain an algorithmic meta-theorem along with corresponding lower bounds to identify tight conditions under which general polynomial-time solvability results can be lifted from primitive to general task networks. Finally, we enrich our investigation by analyzing the parameterized complexity of the three considered problems, and show that (1) fixed-parameter tractability for all three problems can be achieved by replacing the partial order width with the vertex cover number of the network as the parameter, and (2) other classical graph-theoretic parameters of the network (including treewidth, treedepth, and the aforementioned partial order width) do not yield fixed-parameter tractability for any of the three problems.
- [1469] arXiv:2401.14176 (cross-list from cs.SE) [ pdf , other ]
-
Title: Copilot Refinement: Addressing Code Smells in Copilot-Generated Python CodeSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: As one of the most popular dynamic languages, Python experiences a decrease in readability and maintainability when code smells are present. Recent advancements in Large Language Models have sparked growing interest in AI-enabled tools for both code generation and refactoring. GitHub Copilot is one such tool that has gained widespread usage. Copilot Chat, released on September 2023, functions as an interactive tool aims at facilitating natural language-powered coding. However, limited attention has been given to understanding code smells in Copilot-generated Python code and Copilot's ability to fix the code smells it generates. To this end, we built a dataset comprising 102 code smells in Copilot-generated Python code. Our aim is to first explore the occurrence of code smells in Copilot-generated Python code and then evaluate the effectiveness of Copilot in fixing these code smells employing different prompts. The results show that 8 out of 10 types of Python smells can be detected in Copilot-generated Python code, among which Multiply-Nested Container is the most common one. For these code smells, Copilot Chat achieves a highest fixing rate of 87.1%, showing promise in fixing Python code smells generated by Copilot itself. Besides, the effectiveness of Copilot Chat in fixing these smells can be improved with the provision of more detailed prompts. However, using Copilot Chat to fix these smells might introduce new code smells.
- [1470] arXiv:2401.14185 (cross-list from cs.SD) [ pdf , other ]
-
Title: TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down FusionJournal-ref: 2023 13th International Conference on Information Science and Technology (ICIST), Cairo, Egypt, 2023, pp. 243-252Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Audio-visual speech separation has gained significant traction in recent years due to its potential applications in various fields such as speech recognition, diarization, scene analysis and assistive technologies. Designing a lightweight audio-visual speech separation network is important for low-latency applications, but existing methods often require higher computational costs and more parameters to achieve better separation performance. In this paper, we present an audio-visual speech separation model called Top-Down-Fusion Net (TDFNet), a state-of-the-art (SOTA) model for audio-visual speech separation, which builds upon the architecture of TDANet, an audio-only speech separation method. TDANet serves as the architectural foundation for the auditory and visual networks within TDFNet, offering an efficient model with fewer parameters. On the LRS2-2Mix dataset, TDFNet achieves a performance increase of up to 10\% across all performance metrics compared with the previous SOTA method CTCNet. Remarkably, these results are achieved using fewer parameters and only 28\% of the multiply-accumulate operations (MACs) of CTCNet. In essence, our method presents a highly effective and efficient solution to the challenges of speech separation within the audio-visual domain, making significant strides in harnessing visual information optimally.
- [1471] arXiv:2401.14215 (cross-list from cs.CL) [ pdf , other ]
-
Title: Commonsense-augmented Memory Construction and Management in Long-term Conversations via Context-aware Persona RefinementComments: Accepted to EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Memorizing and utilizing speakers' personas is a common practice for response generation in long-term conversations. Yet, human-authored datasets often provide uninformative persona sentences that hinder response quality. This paper presents a novel framework that leverages commonsense-based persona expansion to address such issues in long-term conversation. While prior work focuses on not producing personas that contradict others, we focus on transforming contradictory personas into sentences that contain rich speaker information, by refining them based on their contextual backgrounds with designed strategies. As the pioneer of persona expansion in multi-session settings, our framework facilitates better response generation via human-like persona refinement. The supplementary video of our work is available at https://caffeine-15bbf.web.app/.
- [1472] arXiv:2401.14228 (cross-list from cs.CL) [ pdf , other ]
-
Title: Assessing the Portability of Parameter Matrices Trained by Parameter-Efficient Finetuning MethodsComments: Accepted to Findings of EACL 2024. Camera ready versionSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As the cost of training ever larger language models has grown, so has the interest in reusing previously learnt knowledge. Transfer learning methods have shown how reusing non-task-specific knowledge can help in subsequent task-specific learning. In this paper, we investigate the inverse: porting whole functional modules that encode task-specific knowledge from one model to another. We designed a study comprising 1,440 training/testing runs to test the portability of modules trained by parameter-efficient finetuning (PEFT) techniques, using sentiment analysis as an example task. We test portability in a wide range of scenarios, involving different PEFT techniques and different pretrained host models, among other dimensions. We compare the performance of ported modules with that of equivalent modules trained (i) from scratch, and (ii) from parameters sampled from the same distribution as the ported module. We find that the ported modules far outperform the two alternatives tested, but that there are interesting performance differences between the four PEFT techniques. We conclude that task-specific knowledge in the form of structurally modular sets of parameters as produced by PEFT techniques is highly portable, but that degree of success depends on type of PEFT and on differences between originating and receiving pretrained models.
- [1473] arXiv:2401.14232 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: AR-GAN: Generative Adversarial Network-Based Defense Method Against Adversarial Attacks on the Traffic Sign Classification System of Autonomous VehiclesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: This study developed a generative adversarial network (GAN)-based defense method for traffic sign classification in an autonomous vehicle (AV), referred to as the attack-resilient GAN (AR-GAN). The novelty of the AR-GAN lies in (i) assuming zero knowledge of adversarial attack models and samples and (ii) providing consistently high traffic sign classification performance under various adversarial attack types. The AR-GAN classification system consists of a generator that denoises an image by reconstruction, and a classifier that classifies the reconstructed image. The authors have tested the AR-GAN under no-attack and under various adversarial attacks, such as Fast Gradient Sign Method (FGSM), DeepFool, Carlini and Wagner (C&W), and Projected Gradient Descent (PGD). The authors considered two forms of these attacks, i.e., (i) black-box attacks (assuming the attackers possess no prior knowledge of the classifier), and (ii) white-box attacks (assuming the attackers possess full knowledge of the classifier). The classification performance of the AR-GAN was compared with several benchmark adversarial defense methods. The results showed that both the AR-GAN and the benchmark defense methods are resilient against black-box attacks and could achieve similar classification performance to that of the unperturbed images. However, for all the white-box attacks considered in this study, the AR-GAN method outperformed the benchmark defense methods. In addition, the AR-GAN was able to maintain its high classification performance under varied white-box adversarial perturbation magnitudes, whereas the performance of the other defense methods dropped abruptly at increased perturbation magnitudes.
- [1474] arXiv:2401.14257 (cross-list from cs.CV) [ pdf , other ]
-
Title: Sketch2NeRF: Multi-view Sketch-guided Text-to-3D GenerationAuthors: Minglin Chen , Weihao Yuan , Yukun Wang , Zhe Sheng , Yisheng He , Zilong Dong , Liefeng Bo , Yulan GuoComments: 11 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we present a multi-view sketch-guided text-to-3D generation framework (namely, Sketch2NeRF) to add sketch control to 3D generation. Specifically, our method leverages pretrained 2D diffusion models (e.g., Stable Diffusion and ControlNet) to supervise the optimization of a 3D scene represented by a neural radiance field (NeRF). We propose a novel synchronized generation and reconstruction method to effectively optimize the NeRF. In the experiments, we collected two kinds of multi-view sketch datasets to evaluate the proposed method. We demonstrate that our method can synthesize 3D consistent contents with fine-grained sketch control while being high-fidelity to text prompts. Extensive results show that our method achieves state-of-the-art performance in terms of sketch similarity and text alignment.
- [1475] arXiv:2401.14267 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Transformers and Cortical Waves: Encoders for Pulling In Context Across TimeComments: 25 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The capabilities of transformer networks such as ChatGPT and other Large Language Models (LLMs) have captured the world's attention. The crucial computational mechanism underlying their performance relies on transforming a complete input sequence - for example, all the words in a sentence into a long "encoding vector" - that allows transformers to learn long-range temporal dependencies in naturalistic sequences. Specifically, "self-attention" applied to this encoding vector enhances temporal context in transformers by computing associations between pairs of words in the input sequence. We suggest that waves of neural activity, traveling across single cortical regions or across multiple regions at the whole-brain scale, could implement a similar encoding principle. By encapsulating recent input history into a single spatial pattern at each moment in time, cortical waves may enable temporal context to be extracted from sequences of sensory inputs, the same computational principle used in transformers.
- [1476] arXiv:2401.14279 (cross-list from cs.SE) [ pdf , other ]
-
Title: ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using ChatGPTAuthors: Azmain Kabir , Shaowei Wang , Yuan Tian , Tse-Hsun (Peter) Chen , Muhammad Asaduzzaman , Wenbin ZhangSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Technical question and answering (Q&A) sites such as Stack Overflow have become an important source for software developers to seek knowledge. However, code snippets on Q&A sites are usually uncompilable and semantically incomplete for compilation due to unresolved types and missing dependent libraries, which raises the obstacle for users to reuse or analyze Q&A code snippets. Prior approaches either are not designed for synthesizing compilable code or suffer from a low compilation success rate. To address this problem, we propose ZS4C, a lightweight approach to perform zero-shot synthesis of compilable code from incomplete code snippets using Large Language Model (LLM). ZS4C operates in two stages. In the first stage, ZS4C utilizes an LLM, i.e., ChatGPT, to identify missing import statements for a given code snippet, leveraging our designed task-specific prompt template. In the second stage, ZS4C fixes compilation errors caused by incorrect import statements and syntax errors through collaborative work between ChatGPT and a compiler. We thoroughly evaluated ZS4C on a widely used benchmark called StatType-SO against the SOTA approach SnR. Compared with SnR, ZS4C improves the compilation rate from 63% to 87.6%, with a 39.3% improvement. On average, ZS4C can infer more accurate import statements than SnR, with an improvement of 6.6% in the F1.
- [1477] arXiv:2401.14280 (cross-list from cs.CL) [ pdf , other ]
-
Title: RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via RomanizationAuthors: Jaavid Aktar Husain , Raj Dabre , Aswanth Kumar , Jay Gala , Thanmay Jayakumar , Ratish Puduppully , Anoop KunchukuttanComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study addresses the challenge of extending Large Language Models (LLMs) to non-English languages using non-Roman scripts. We propose an approach that utilizes the romanized form of text as an interface for LLMs, hypothesizing that its frequent informal use and shared tokens with English enhance cross-lingual alignment. Our approach involves the continual pretraining of an English LLM like Llama 2 on romanized text of non-English, non-Roman script languages, followed by instruction tuning on romanized data. The results indicate that romanized text not only reduces token fertility by 2x-4x but also matches or outperforms native script representation across various NLU, NLG, and MT tasks. Moreover, the embeddings computed on romanized text exhibit closer alignment with their English translations than those from the native script. Our approach presents a promising direction for leveraging the power of English LLMs in languages traditionally underrepresented in NLP.
- [1478] arXiv:2401.14285 (cross-list from cs.CV) [ pdf , other ]
-
Title: POUR-Net: A Population-Prior-Aided Over-Under-Representation Network for Low-Count PET Attenuation Map GenerationAuthors: Bo Zhou , Jun Hou , Tianqi Chen , Yinchi Zhou , Xiongchao Chen , Huidong Xie , Qiong Liu , Xueqi Guo , Yu-Jung Tsai , Vladimir Y. Panin , Takuya Toyonaga , James S. Duncan , Chi LiuComments: 10 pages, 5 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Low-dose PET offers a valuable means of minimizing radiation exposure in PET imaging. However, the prevalent practice of employing additional CT scans for generating attenuation maps (u-map) for PET attenuation correction significantly elevates radiation doses. To address this concern and further mitigate radiation exposure in low-dose PET exams, we propose POUR-Net - an innovative population-prior-aided over-under-representation network that aims for high-quality attenuation map generation from low-dose PET. First, POUR-Net incorporates an over-under-representation network (OUR-Net) to facilitate efficient feature extraction, encompassing both low-resolution abstracted and fine-detail features, for assisting deep generation on the full-resolution level. Second, complementing OUR-Net, a population prior generation machine (PPGM) utilizing a comprehensive CT-derived u-map dataset, provides additional prior information to aid OUR-Net generation. The integration of OUR-Net and PPGM within a cascade framework enables iterative refinement of $\mu$-map generation, resulting in the production of high-quality $\mu$-maps. Experimental results underscore the effectiveness of POUR-Net, showing it as a promising solution for accurate CT-free low-count PET attenuation correction, which also surpasses the performance of previous baseline methods.
- [1479] arXiv:2401.14292 (cross-list from cs.RO) [ pdf , other ]
-
Title: Single and bi-layered 2-D acoustic soft tactile skin (AST2)Comments: IEEE Robosoft conference 2024 (accepted)Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: This paper aims to present an innovative and cost-effective design for Acoustic Soft Tactile (AST) Skin, with the primary goal of significantly enhancing the accuracy of 2-D tactile feature estimation. The existing challenge lies in achieving precise tactile feature estimation, especially concerning contact geometry characteristics, using cost-effective solutions. We hypothesise that by harnessing acoustic energy through dedicated acoustic channels in 2 layers beneath the sensing surface and analysing amplitude modulation, we can effectively decode interactions on the sensory surface, thereby improving tactile feature estimation. Our approach involves the distinct separation of hardware components responsible for emitting and receiving acoustic signals, resulting in a modular and highly customizable skin design. Practical tests demonstrate the effectiveness of this novel design, achieving remarkable precision in estimating contact normal forces (MAE < 0.8 N), 2D contact localisation (MAE < 0.7 mm), and contact surface diameter (MAE < 0.3 mm). In conclusion, the AST skin, with its innovative design and modular architecture, successfully addresses the challenge of tactile feature estimation. The presented results showcase its ability to precisely estimate various tactile features, making it a practical and cost-effective solution for robotic applications.
- [1480] arXiv:2401.14295 (cross-list from cs.CL) [ pdf , other ]
-
Title: Demystifying Chains, Trees, and Graphs of ThoughtsAuthors: Maciej Besta , Florim Memedi , Zhenyu Zhang , Robert Gerstenberger , Guangyuan Piao , Nils Blach , Piotr Nyczyk , Marcin Copik , Grzegorz Kwaśniewski , Jürgen Müller , Lukas Gianinazzi , Ales Kubicek , Hubert Niewiadomski , Aidan O'Mahony , Onur Mutlu , Torsten HoeflerSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM's capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context. Our study compares existing prompting schemes using the proposed taxonomy, discussing how certain design choices lead to different patterns in performance and cost. We also outline theoretical underpinnings, relationships between prompting and other parts of the LLM ecosystem such as knowledge bases, and the associated research challenges. Our work will help to advance future prompt engineering techniques.
- [1481] arXiv:2401.14336 (cross-list from cs.CV) [ pdf , other ]
-
Title: Progressive Multi-task Anti-Noise Learning and Distilling Frameworks for Fine-grained Vehicle RecognitionAuthors: Dichao LiuSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Fine-grained vehicle recognition (FGVR) is an essential fundamental technology for intelligent transportation systems, but very difficult because of its inherent intra-class variation. Most previous FGVR studies only focus on the intra-class variation caused by different shooting angles, positions, etc., while the intra-class variation caused by image noise has received little attention. This paper proposes a progressive multi-task anti-noise learning (PMAL) framework and a progressive multi-task distilling (PMD) framework to solve the intra-class variation problem in FGVR due to image noise. The PMAL framework achieves high recognition accuracy by treating image denoising as an additional task in image recognition and progressively forcing a model to learn noise invariance. The PMD framework transfers the knowledge of the PMAL-trained model into the original backbone network, which produces a model with about the same recognition accuracy as the PMAL-trained model, but without any additional overheads over the original backbone network. Combining the two frameworks, we obtain models that significantly exceed previous state-of-the-art methods in recognition accuracy on two widely-used, standard FGVR datasets, namely Stanford Cars, and CompCars, as well as three additional surveillance image-based vehicle-type classification datasets, namely Beijing Institute of Technology (BIT)-Vehicle, Vehicle Type Image Data 2 (VTID2), and Vehicle Images Dataset for Make Model Recognition (VIDMMR), without any additional overheads over the original backbone networks. The source code is available at this https URL
- [1482] arXiv:2401.14362 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health SupportComments: The first two authors contributed equally to this work; typos correctedSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: People experiencing severe distress increasingly use Large Language Model (LLM) chatbots as mental health support tools. Discussions on social media have described how engagements were lifesaving for some, but evidence suggests that general-purpose LLM chatbots also have notable risks that could endanger the welfare of users if not designed responsibly. In this study, we investigate the lived experiences of people who have used LLM chatbots for mental health support. We build on interviews with 21 individuals from globally diverse backgrounds to analyze how users create unique support roles for their chatbots, fill in gaps in everyday care, and navigate associated cultural limitations when seeking support from chatbots. We ground our analysis in psychotherapy literature around effective support, and introduce the concept of therapeutic alignment, or aligning AI with therapeutic values for mental health contexts. Our study offers recommendations for how designers can approach the ethical and effective use of LLM chatbots and other AI mental health support tools in mental health care.
- [1483] arXiv:2401.14367 (cross-list from cs.CL) [ pdf , other ]
-
Title: Genie: Achieving Human Parity in Content-Grounded Datasets GenerationAuthors: Asaf Yehudai , Boaz Carmeli , Yosi Mass , Ofir Arviv , Nathaniel Mills , Assaf Toledo , Eyal Shnarch , Leshem ChoshenComments: Accepted to ICLR24Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The lack of high-quality data for content-grounded generation tasks has been identified as a major obstacle to advancing these tasks. To address this gap, we propose Genie, a novel method for automatically generating high-quality content-grounded data. It consists of three stages: (a) Content Preparation, (b) Generation: creating task-specific examples from the content (e.g., question-answer pairs or summaries). (c) Filtering mechanism aiming to ensure the quality and faithfulness of the generated data. We showcase this methodology by generating three large-scale synthetic data, making wishes, for Long-Form Question-Answering (LFQA), summarization, and information extraction. In a human evaluation, our generated data was found to be natural and of high quality. Furthermore, we compare models trained on our data with models trained on human-written data -- ELI5 and ASQA for LFQA and CNN-DailyMail for Summarization. We show that our models are on par with or outperforming models trained on human-generated data and consistently outperforming them in faithfulness. Finally, we applied our method to create LFQA data within the medical domain and compared a model trained on it with models trained on other domains.
- [1484] arXiv:2401.14371 (cross-list from cs.ET) [ pdf , other ]
-
Title: Efficient Optimisation of Physical Reservoir Computers using only a Delayed InputSubjects: Emerging Technologies (cs.ET) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Optics (physics.optics)
Abstract: We present an experimental validation of a recently proposed optimization technique for reservoir computing, using an optoelectronic setup. Reservoir computing is a robust framework for signal processing applications, and the development of efficient optimization approaches remains a key challenge. The technique we address leverages solely a delayed version of the input signal to identify the optimal operational region of the reservoir, simplifying the traditionally time-consuming task of hyperparameter tuning. We verify the effectiveness of this approach on different benchmark tasks and reservoir operating conditions.
- [1485] arXiv:2401.14373 (cross-list from cs.CL) [ pdf , other ]
-
Title: TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and GenerationAuthors: Gökçe Uludoğan , Zeynep Yirmibeşoğlu Balal , Furkan Akkurt , Melikşah Türker , Onur Güngör , Susan ÜsküdarlıSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The recent advances in natural language processing have predominantly favored well-resourced English-centric models, resulting in a significant gap with low-resource languages. In this work, we introduce the language model TURNA, which is developed for the low-resource language Turkish and is capable of both natural language understanding and generation tasks. TURNA is pretrained with an encoder-decoder architecture based on the unified framework UL2 with a diverse corpus that we specifically curated for this purpose. We evaluated TURNA with three generation tasks and five understanding tasks for Turkish. The results show that TURNA outperforms several multilingual models in both understanding and generation tasks, and competes with monolingual Turkish models in understanding tasks. TURNA is made available at this https URL .
- [1486] arXiv:2401.14403 (cross-list from cs.RO) [ pdf , other ]
-
Title: Adaptive Mobile Manipulation for Articulated Objects In the Open WorldComments: Website at this https URLSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: Deploying robots in open-ended unstructured environments such as homes has been a long-standing research problem. However, robots are often studied only in closed-off lab settings, and prior mobile manipulation work is restricted to pick-move-place, which is arguably just the tip of the iceberg in this area. In this paper, we introduce Open-World Mobile Manipulation System, a full-stack approach to tackle realistic articulated object operation, e.g. real-world doors, cabinets, drawers, and refrigerators in open-ended unstructured environments. The robot utilizes an adaptive learning framework to initially learns from a small set of data through behavior cloning, followed by learning from online practice on novel objects that fall outside the training distribution. We also develop a low-cost mobile manipulation hardware platform capable of safe and autonomous online adaptation in unstructured environments with a cost of around 20,000 USD. In our experiments we utilize 20 articulate objects across 4 buildings in the CMU campus. With less than an hour of online learning for each object, the system is able to increase success rate from 50% of BC pre-training to 95% using online adaptation. Video results at this https URL
- [1487] arXiv:2401.14405 (cross-list from cs.CV) [ pdf , other ]
-
Title: Multimodal Pathway: Improve Transformers with Irrelevant Data from Other ModalitiesComments: CVPR 2024. Code and models are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We propose to improve transformers of a specific modality with irrelevant data from other modalities, e.g., improve an ImageNet model with audio or point cloud datasets. We would like to highlight that the data samples of the target modality are irrelevant to the other modalities, which distinguishes our method from other works utilizing paired (e.g., CLIP) or interleaved data of different modalities. We propose a methodology named Multimodal Pathway - given a target modality and a transformer designed for it, we use an auxiliary transformer trained with data of another modality and construct pathways to connect components of the two models so that data of the target modality can be processed by both models. In this way, we utilize the universal sequence-to-sequence modeling abilities of transformers obtained from two modalities. As a concrete implementation, we use a modality-specific tokenizer and task-specific head as usual but utilize the transformer blocks of the auxiliary model via a proposed method named Cross-Modal Re-parameterization, which exploits the auxiliary weights without any inference costs. On the image, point cloud, video, and audio recognition tasks, we observe significant and consistent performance improvements with irrelevant data from other modalities. The code and models are available at this https URL .
- [1488] arXiv:2401.14412 (cross-list from cs.LG) [ pdf , other ]
-
Title: Harnessing Neuron Stability to Improve DNN VerificationComments: VeriStable and experimental data are available at: this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep Neural Networks (DNN) have emerged as an effective approach to tackling real-world problems. However, like human-written software, DNNs are susceptible to bugs and attacks. This has generated significant interests in developing effective and scalable DNN verification techniques and tools. In this paper, we present VeriStable, a novel extension of recently proposed DPLL-based constraint DNN verification approach. VeriStable leverages the insight that while neuron behavior may be non-linear across the entire DNN input space, at intermediate states computed during verification many neurons may be constrained to have linear behavior - these neurons are stable. Efficiently detecting stable neurons reduces combinatorial complexity without compromising the precision of abstractions. Moreover, the structure of clauses arising in DNN verification problems shares important characteristics with industrial SAT benchmarks. We adapt and incorporate multi-threading and restart optimizations targeting those characteristics to further optimize DPLL-based DNN verification. We evaluate the effectiveness of VeriStable across a range of challenging benchmarks including fully-connected feedforward networks (FNNs), convolutional neural networks (CNNs) and residual networks (ResNets) applied to the standard MNIST and CIFAR datasets. Preliminary results show that VeriStable is competitive and outperforms state-of-the-art DNN verification tools, including $\alpha$-$\beta$-CROWN and MN-BaB, the first and second performers of the VNN-COMP, respectively.
- [1489] arXiv:2401.14413 (cross-list from cs.LG) [ pdf , other ]
-
Title: Aprendizado de máquina aplicado na eletroquímicaAuthors: Carlos Eduardo do Egito Araújo , Lívia F. Sgobbi , Iwens Gervasio Sene Jr , Sergio Teixeira de CarvalhoComments: in Portuguese languageSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This systematic review focuses on analyzing the use of machine learning techniques for identifying and quantifying analytes in various electrochemical applications, presenting the available applications in the literature. Machine learning is a tool that can facilitate the analysis and enhance the understanding of processes involving various analytes. In electrochemical biosensors, it increases the precision of medical diagnostics, improving the identification of biomarkers and pathogens with high reliability. It can be effectively used for the classification of complex chemical products; in environmental monitoring, using low-cost sensors; in portable devices and wearable systems; among others. Currently, the analysis of some analytes is still performed manually, requiring the expertise of a specialist in the field and thus hindering the generalization of results. In light of the advancements in artificial intelligence today, this work proposes to carry out a systematic review of the literature on the applications of artificial intelligence techniques. A set of articles has been identified that address electrochemical problems using machine learning techniques, more specifically, supervised learning.
- [1490] arXiv:2401.14417 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Fuzzy Logic Function as a Post-hoc Explanator of the Nonlinear ClassifierJournal-ref: Fuzzy Logic and Technology, and Aggregation Operators. EUSFLAT AGOP 2023 2023. LNCS, vol. 14069, pp. 431-442. Springer, Cham (2023)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Pattern recognition systems implemented using deep neural networks achieve better results than linear models. However, their drawback is the black box property. This property means that one with no experience utilising nonlinear systems may need help understanding the outcome of the decision. Such a solution is unacceptable to the user responsible for the final decision. He must not only believe in the decision but also understand it. Therefore, recognisers must have an architecture that allows interpreters to interpret the findings. The idea of post-hoc explainable classifiers is to design an interpretable classifier parallel to the black box classifier, giving the same decisions as the black box classifier. This paper shows that the explainable classifier completes matching classification decisions with the black box classifier on the MNIST and FashionMNIST databases if Zadeh`s fuzzy logic function forms the classifier and DeconvNet importance gives the truth values. Since the other tested significance measures achieved lower performance than DeconvNet, it is the optimal transformation of the feature values to their truth values as inputs to the fuzzy logic function for the databases and recogniser architecture used.
- [1491] arXiv:2401.14424 (cross-list from cs.LG) [ pdf , other ]
-
Title: Discovering Mathematical Formulas from Data via GPT-guided Monte Carlo Tree SearchAuthors: Yanjie Li , Weijun Li , Lina Yu , Min Wu , Jingyi Liu , Wenqiang Li , Meilan Hao , Shu Wei , Yusong DengComments: 24 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Finding a concise and interpretable mathematical formula that accurately describes the relationship between each variable and the predicted value in the data is a crucial task in scientific research, as well as a significant challenge in artificial intelligence. This problem is referred to as symbolic regression, which is an NP-hard problem. In the previous year, a novel symbolic regression methodology utilizing Monte Carlo Tree Search (MCTS) was advanced, achieving state-of-the-art results on a diverse range of datasets. although this algorithm has shown considerable improvement in recovering target expressions compared to previous methods, the lack of guidance during the MCTS process severely hampers its search efficiency. Recently, some algorithms have added a pre-trained policy network to guide the search of MCTS, but the pre-trained policy network generalizes poorly. To optimize the trade-off between efficiency and versatility, we introduce SR-GPT, a novel algorithm for symbolic regression that integrates Monte Carlo Tree Search (MCTS) with a Generative Pre-Trained Transformer (GPT). By using GPT to guide the MCTS, the search efficiency of MCTS is significantly improved. Next, we utilize the MCTS results to further refine the GPT, enhancing its capabilities and providing more accurate guidance for the MCTS. MCTS and GPT are coupled together and optimize each other until the target expression is successfully determined. We conducted extensive evaluations of SR-GPT using 222 expressions sourced from over 10 different symbolic regression datasets. The experimental results demonstrate that SR-GPT outperforms existing state-of-the-art algorithms in accurately recovering symbolic expressions both with and without added noise.
- [1492] arXiv:2401.14425 (cross-list from cs.HC) [ pdf , other ]
-
Title: No Longer Trending on Artstation: Prompt Analysis of Generative AI ArtComments: Paper accepted for EvoMUSART 2024, Aberystwyth, Wales, United Kingdom, 3-5 April 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Neural and Evolutionary Computing (cs.NE)
Abstract: Image generation using generative AI is rapidly becoming a major new source of visual media, with billions of AI generated images created using diffusion models such as Stable Diffusion and Midjourney over the last few years. In this paper we collect and analyse over 3 million prompts and the images they generate. Using natural language processing, topic analysis and visualisation methods we aim to understand collectively how people are using text prompts, the impact of these systems on artists, and more broadly on the visual cultures they promote. Our study shows that prompting focuses largely on surface aesthetics, reinforcing cultural norms, popular conventional representations and imagery. We also find that many users focus on popular topics (such as making colouring books, fantasy art, or Christmas cards), suggesting that the dominant use for the systems analysed is recreational rather than artistic.
- [1493] arXiv:2401.14426 (cross-list from cs.LG) [ pdf , other ]
-
Title: M$^3$TN: Multi-gate Mixture-of-Experts based Multi-valued Treatment Network for Uplift ModelingComments: ICASSP 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Methodology (stat.ME)
Abstract: Uplift modeling is a technique used to predict the effect of a treatment (e.g., discounts) on an individual's response. Although several methods have been proposed for multi-valued treatment, they are extended from binary treatment methods. There are still some limitations. Firstly, existing methods calculate uplift based on predicted responses, which may not guarantee a consistent uplift distribution between treatment and control groups. Moreover, this may cause cumulative errors for multi-valued treatment. Secondly, the model parameters become numerous with many prediction heads, leading to reduced efficiency. To address these issues, we propose a novel \underline{M}ulti-gate \underline{M}ixture-of-Experts based \underline{M}ulti-valued \underline{T}reatment \underline{N}etwork (M$^3$TN). M$^3$TN consists of two components: 1) a feature representation module with Multi-gate Mixture-of-Experts to improve the efficiency; 2) a reparameterization module by modeling uplift explicitly to improve the effectiveness. We also conduct extensive experiments to demonstrate the effectiveness and efficiency of our M$^3$TN.
- [1494] arXiv:2401.14434 (cross-list from cs.CV) [ pdf , other ]
-
Title: Transforming gradient-based techniques into interpretable methodsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.
- [1495] arXiv:2401.14436 (cross-list from cs.MA) [ pdf , other ]
-
Title: Trust model of privacy-concerned, emotionally-aware agents in a cooperative logistics problemSubjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI)
Abstract: In this paper we propose a trust model to be used into a hypothetical mixed environment where humans and unmanned vehicles cooperate. We address the inclusion of emotions inside a trust model in a coherent way to the practical approaches to the current psychology theories. The most innovative contribution is how privacy issues play a role in the cooperation decisions of the emotional trust model. Both, emotions and trust have been cognitively modeled and managed with the Beliefs, Desires and Intentions (BDI) paradigm into autonomous agents implemented in GAML (the programming language of GAMA agent platform) that communicates using the IEEE FIPA standard. The trusting behaviour of these emotional agents is tested in a cooperative logistics problem where: agents have to move objects to destinations and some of the objects and places have privacy issues. The execution of simulations of this logistic problem shows how emotions and trust contribute to improve the performance of agents in terms of both, time savings and privacy protection
- [1496] arXiv:2401.14440 (cross-list from cs.CL) [ pdf , other ]
-
Title: Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI ModelsComments: EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: Recent studies of the emergent capabilities of transformer-based Natural Language Understanding (NLU) models have indicated that they have an understanding of lexical and compositional semantics. We provide evidence that suggests these claims should be taken with a grain of salt: we find that state-of-the-art Natural Language Inference (NLI) models are sensitive towards minor semantics preserving surface-form variations, which lead to sizable inconsistent model decisions during inference. Notably, this behaviour differs from valid and in-depth comprehension of compositional semantics, however does neither emerge when evaluating model accuracy on standard benchmarks nor when probing for syntactic, monotonic, and logically robust reasoning. We propose a novel framework to measure the extent of semantic sensitivity. To this end, we evaluate NLI models on adversarially generated examples containing minor semantics-preserving surface-form input noise. This is achieved using conditional text generation, with the explicit condition that the NLI model predicts the relationship between the original and adversarial inputs as a symmetric equivalence entailment. We systematically study the effects of the phenomenon across NLI models for $\textbf{in-}$ and $\textbf{out-of-}$ domain settings. Our experiments show that semantic sensitivity causes performance degradations of $12.92\%$ and $23.71\%$ average over $\textbf{in-}$ and $\textbf{out-of-}$ domain settings, respectively. We further perform ablation studies, analysing this phenomenon across models, datasets, and variations in inference and show that semantic sensitivity can lead to major inconsistency within model predictions.
- [1497] arXiv:2401.14444 (cross-list from cs.SD) [ pdf , other ]
-
Title: ICASSP 2024 Speech Signal Improvement ChallengeAuthors: Nicolae Catalin Ristea , Ando Saabas , Ross Cutler , Babak Naderi , Sebastian Braun , Solomiya BranetsSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
Abstract: The ICASSP 2024 Speech Signal Improvement Grand Challenge is intended to stimulate research in the area of improving the speech signal quality in communication systems. This marks our second challenge, building upon the success from the previous ICASSP 2023 Grand Challenge. We enhance the competition by introducing a dataset synthesizer, enabling all participating teams to start at a higher baseline, an objective metric for our extended P.804 tests, transcripts for the 2023 test set, and we add Word Accuracy (WAcc) as a metric. We evaluate a total of 13 systems in the real-time track and 11 systems in the non-real-time track using both subjective P.804 and objective Word Accuracy metrics.
- [1498] arXiv:2401.14446 (cross-list from cs.CY) [ pdf , other ]
-
Title: Black-Box Access is Insufficient for Rigorous AI AuditsAuthors: Stephen Casper , Carson Ezell , Charlotte Siegmann , Noam Kolt , Taylor Lynn Curtis , Benjamin Bucknall , Andreas Haupt , Kevin Wei , Jérémy Scheurer , Marius Hobbhahn , Lee Sharkey , Satyapriya Krishna , Marvin Von Hagen , Silas Alberti , Alan Chan , Qinyi Sun , Michael Gerovitch , David Bau , Max Tegmark , David Krueger , Dylan Hadfield-MenellSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of system access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to its training and deployment information (e.g., methodology, code, documentation, hyperparameters, data, deployment details, findings from internal evaluations) allows for auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.
- [1499] arXiv:2401.14447 (cross-list from cs.HC) [ pdf , other ]
-
Title: Wordflow: Social Prompt Engineering for Large Language ModelsComments: 8 pages, 7 figures. Wordflow is available at: this https URL The code is available at: this https URL For a demo video, see: this https URLSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) require well-crafted prompts for effective use. Prompt engineering, the process of designing prompts, is challenging, particularly for non-experts who are less familiar with AI technologies. While researchers have proposed techniques and tools to assist LLM users in prompt design, these works primarily target AI application developers rather than non-experts. To address this research gap, we propose social prompt engineering, a novel paradigm that leverages social computing techniques to facilitate collaborative prompt design. To investigate social prompt engineering, we introduce Wordflow, an open-source and social text editor that enables everyday users to easily create, run, share, and discover LLM prompts. Additionally, by leveraging modern web technologies, Wordflow allows users to run LLMs locally and privately in their browsers. Two usage scenarios highlight how social prompt engineering and our tool can enhance laypeople's interaction with LLMs. Wordflow is publicly accessible at this https URL .
- [1500] arXiv:2401.14469 (cross-list from cs.LG) [ pdf , other ]
-
Title: Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional KernelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)
Abstract: Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures, that surpass the performance of classical CNNs, by a considerable scalability and accuracy margin. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers. Through an extensive analysis of millions of trained filters, with different sizes and from various models, we employed unsupervised clustering with autoencoders, to categorize these filters. Astonishingly, the patterns converged into a few main clusters, each resembling the difference of Gaussian (DoG) functions, and their first and second-order derivatives. Notably, we were able to classify over 95\% and 90\% of the filters from state-of-the-art ConvNextV2 and ConvNeXt models, respectively. This finding is not merely a technological curiosity; it echoes the foundational models neuroscientists have long proposed for the vision systems of mammals. Our results thus deepen our understanding of the emergent properties of trained DS-CNNs and provide a bridge between artificial and biological visual processing systems. More broadly, they pave the way for more interpretable and biologically-inspired neural network designs in the future.
- [1501] arXiv:2401.14484 (cross-list from cs.HC) [ pdf , other ]
-
Title: Design Principles for Generative AI ApplicationsAuthors: Justin D. Weisz , Jessica He , Michael Muller , Gabriela Hoefer , Rachel Miles , Werner GeyerComments: 34 pages, 4 figures. To be published in CHI 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Generative AI applications present unique design challenges. As generative AI technologies are increasingly being incorporated into mainstream applications, there is an urgent need for guidance on how to design user experiences that foster effective and safe use. We present six principles for the design of generative AI applications that address unique characteristics of generative AI UX and offer new interpretations and extensions of known issues in the design of AI applications. Each principle is coupled with a set of design strategies for implementing that principle via UX capabilities or through the design process. The principles and strategies were developed through an iterative process involving literature review, feedback from design practitioners, validation against real-world generative AI applications, and incorporation into the design process of two generative AI applications. We anticipate the principles to usefully inform the design of generative AI applications by driving actionable design recommendations.
- [1502] arXiv:2401.14488 (cross-list from cs.LG) [ pdf , other ]
-
Title: Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling researchSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Robotics (cs.RO)
Abstract: One problem with researching cognitive modeling and reinforcement learning (RL) is that researchers spend too much time on setting up an appropriate computational framework for their experiments. Many open source implementations of current RL algorithms exist, but there is a lack of a modular suite of tools combining different robotic simulators and platforms, data visualization, hyperparameter optimization, and baseline experiments. To address this problem, we present Scilab-RL, a software framework for efficient research in cognitive modeling and reinforcement learning for robotic agents. The framework focuses on goal-conditioned reinforcement learning using Stable Baselines 3 and the OpenAI gym interface. It enables native possibilities for experiment visualizations and hyperparameter optimization. We describe how these features enable researchers to conduct experiments with minimal time effort, thus maximizing research output.
- [1503] arXiv:2401.14489 (cross-list from cs.DC) [ pdf , other ]
-
Title: The Case for Co-Designing Model Architectures with HardwareAuthors: Quentin Anthony , Jacob Hatef , Deepak Narayanan , Stella Biderman , Stas Bekman , Junqi Yin , Aamir Shafi , Hari Subramoni , Dhabaleswar PandaSubjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI)
Abstract: While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL model to be more amenable to the target hardware can significantly improve the runtime performance of DL training and inference. In this paper, we provide a set of guidelines for users to maximize the runtime performance of their transformer models. These guidelines have been created by carefully considering the impact of various model hyperparameters controlling model shape on the efficiency of the underlying computation kernels executed on the GPU. We find the throughput of models with efficient model shapes is up to 39\% higher while preserving accuracy compared to models with a similar number of parameters but with unoptimized shapes.
- [1504] arXiv:2401.14504 (cross-list from eess.SY) [ pdf , other ]
-
Title: Learning When to See for Long-term Traffic Data Collection on Power-constrained DevicesComments: Accepted by IEEE 26th International Conference on Intelligent Transportation SystemsSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Collecting traffic data is crucial for transportation systems and urban planning, and is often more desirable through easy-to-deploy but power-constrained devices, due to the unavailability or high cost of power and network infrastructure. The limited power means an inevitable trade-off between data collection duration and accuracy/resolution. We introduce a novel learning-based framework that strategically decides observation timings for battery-powered devices and reconstructs the full data stream from sparsely sampled observations, resulting in minimal performance loss and a significantly prolonged system lifetime. Our framework comprises a predictor, a controller, and an estimator. The predictor utilizes historical data to forecast future trends within a fixed time horizon. The controller uses the forecasts to determine the next optimal timing for data collection. Finally, the estimator reconstructs the complete data profile from the sampled observations. We evaluate the performance of the proposed method on PeMS data by an RNN (Recurrent Neural Network) predictor and estimator, and a DRQN (Deep Recurrent Q-Network) controller, and compare it against the baseline that uses Kalman filter and uniform sampling. The results indicate that our method outperforms the baseline, primarily due to the inclusion of more representative data points in the profile, resulting in an overall 10\% improvement in estimation accuracy. Source code will be publicly available.
- [1505] arXiv:2401.14521 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Towards Interpretable Physical-Conceptual Catchment-Scale Hydrological Modeling using the Mass-Conserving-PerceptronComments: 50 pages, 7 Figures, 2 Tables, 1 Supplementary MaterialSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We investigate the applicability of machine learning technologies to the development of parsimonious, interpretable, catchment-scale hydrologic models using directed-graph architectures based on the mass-conserving perceptron (MCP) as the fundamental computational unit. Here, we focus on architectural complexity (depth) at a single location, rather than universal applicability (breadth) across large samples of catchments. The goal is to discover a minimal representation (numbers of cell-states and flow paths) that represents the dominant processes that can explain the input-state-output behaviors of a given catchment, with particular emphasis given to simulating the full range (high, medium, and low) of flow dynamics. We find that a HyMod-like architecture with three cell-states and two major flow pathways achieves such a representation at our study location, but that the additional incorporation of an input-bypass mechanism significantly improves the timing and shape of the hydrograph, while the inclusion of bi-directional groundwater mass exchanges significantly enhances the simulation of baseflow. Overall, our results demonstrate the importance of using multiple diagnostic metrics for model evaluation, while highlighting the need for designing training metrics that are better suited to extracting information across the full range of flow dynamics. Further, they set the stage for interpretable regional-scale MCP-based hydrological modeling (using large sample data) by using neural architecture search to determine appropriate minimal representations for catchments in different hydroclimatic regimes.
- [1506] arXiv:2401.14523 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Empathy and the Right to Be an Exception: What LLMs Can and Cannot DoSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Advances in the performance of large language models (LLMs) have led some researchers to propose the emergence of theory of mind (ToM) in artificial intelligence (AI). LLMs can attribute beliefs, desires, intentions, and emotions, and they will improve in their accuracy. Rather than employing the characteristically human method of empathy, they learn to attribute mental states by recognizing linguistic patterns in a dataset that typically do not include that individual. We ask whether LLMs' inability to empathize precludes them from honoring an individual's right to be an exception, that is, from making assessments of character and predictions of behavior that reflect appropriate sensitivity to a person's individuality. Can LLMs seriously consider an individual's claim that their case is different based on internal mental states like beliefs, desires, and intentions, or are they limited to judging that case based on its similarities to others? We propose that the method of empathy has special significance for honoring the right to be an exception that is distinct from the value of predictive accuracy, at which LLMs excel. We conclude by considering whether using empathy to consider exceptional cases has intrinsic or merely practical value and we introduce conceptual and empirical avenues for advancing this investigation.
- [1507] arXiv:2401.14524 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Evaluating GPT-3.5's Awareness and Summarization Abilities for European Constitutional Texts with Shared TopicsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Digital Libraries (cs.DL); Physics and Society (physics.soc-ph)
Abstract: Constitutions are foundational legal documents that underpin the governmental and societal structures. As such, they are a reflection of a nation's cultural and social uniqueness, but also contribute to establish topics of universal importance, like citizens' rights and duties (RD). In this work, using the renowned GPT-3.5, we leverage generative large language models to understand constitutional passages that transcend national boundaries. A key contribution of our study is the introduction of a novel application of abstractive summarization on a multi-source collection of constitutional texts, with a focus on European countries' constitution passages related to RD topics. Our results show the meaningfulness of GPT-3.5 to produce informative, coherent and faithful summaries capturing RD topics across European countries.
- [1508] arXiv:2401.14530 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Relative Value Biases in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Studies of reinforcement learning in humans and animals have demonstrated a preference for options that yielded relatively better outcomes in the past, even when those options are associated with lower absolute reward. The present study tested whether large language models would exhibit a similar bias. We had gpt-4-1106-preview (GPT-4 Turbo) and Llama-2-70B make repeated choices between pairs of options with the goal of maximizing payoffs. A complete record of previous outcomes was included in each prompt. Both models exhibited relative value decision biases similar to those observed in humans and animals. Making relative comparisons among outcomes more explicit magnified the bias, whereas prompting the models to estimate expected outcomes caused the bias to disappear. These results have implications for the potential mechanisms that contribute to context-dependent choice in human agents.
- [1509] arXiv:2401.14542 (cross-list from cs.SD) [ pdf , other ]
-
Title: Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music ModelComments: 14 pages + references. Under conference reviewSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: Every artist has a creative process that draws inspiration from previous artists and their works. Today, "inspiration" has been automated by generative music models. The black box nature of these models obscures the identity of the works that influence their creative output. As a result, users may inadvertently appropriate, misuse, or copy existing artists' works. We establish a replicable methodology to systematically identify similar pieces of music audio in a manner that is useful for understanding training data attribution. A key aspect of our approach is to harness an effective music audio similarity measure. We compare the effect of applying CLMR and CLAP embeddings to similarity measurement in a set of 5 million audio clips used to train VampNet, a recent open source generative music model. We validate this approach with a human listening study. We also explore the effect that modifications of an audio example (e.g., pitch shifting, time stretching, background noise) have on similarity measurements. This work is foundational to incorporating automated influence attribution into generative modeling, which promises to let model creators and users move from ignorant appropriation to informed creation. Audio samples that accompany this paper are available at this https URL .
- [1510] arXiv:2401.14559 (cross-list from cs.CL) [ pdf , other ]
-
Title: Language Modelling Approaches to Adaptive Machine TranslationAuthors: Yasmin MoslemComments: PhD thesisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Abstract: Consistency is a key requirement of high-quality translation. It is especially important to adhere to pre-approved terminology and adapt to corrected translations in domain-specific projects. Machine translation (MT) has achieved significant progress in the area of domain adaptation. However, in-domain data scarcity is common in translation settings, due to the lack of specialised datasets and terminology, or inconsistency and inaccuracy of available in-domain translations. In such scenarios where there is insufficient in-domain data to fine-tune MT models, producing translations that are consistent with the relevant context is challenging. While real-time adaptation can make use of smaller amounts of in-domain data to improve the translation on the fly, it remains challenging due to supported context limitations and efficiency constraints. Large language models (LLMs) have recently shown interesting capabilities of in-context learning, where they learn to replicate certain input-output text generation patterns, without further fine-tuning. Such capabilities have opened new horizons for domain-specific data augmentation and real-time adaptive MT. This work attempts to address two main relevant questions: 1) in scenarios involving human interaction and continuous feedback, can we employ language models to improve the quality of adaptive MT at inference time? and 2) in the absence of sufficient in-domain data, can we use pre-trained large-scale language models to improve the process of MT domain adaptation?
- [1511] arXiv:2401.14571 (cross-list from cs.HC) [ pdf , other ]
-
Title: Driving Towards Inclusion: Revisiting In-Vehicle Interaction in Autonomous VehiclesSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: This paper presents a comprehensive literature review of the current state of in-vehicle human-computer interaction (HCI) in the context of self-driving vehicles, with a specific focus on inclusion and accessibility. This study's aim is to examine the user-centered design principles for inclusive HCI in self-driving vehicles, evaluate existing HCI systems, and identify emerging technologies that have the potential to enhance the passenger experience. The paper begins by providing an overview of the current state of self-driving vehicle technology, followed by an examination of the importance of HCI in this context. Next, the paper reviews the existing literature on inclusive HCI design principles and evaluates the effectiveness of current HCI systems in self-driving vehicles. The paper also identifies emerging technologies that have the potential to enhance the passenger experience, such as voice-activated interfaces, haptic feedback systems, and augmented reality displays. Finally, the paper proposes an end-to-end design framework for the development of an inclusive in-vehicle experience, which takes into consideration the needs of all passengers, including those with disabilities, or other accessibility requirements. This literature review highlights the importance of user-centered design principles in the development of HCI systems for self-driving vehicles and emphasizes the need for inclusive design to ensure that all passengers can safely and comfortably use these vehicles. The proposed end-to-end design framework provides a practical approach to achieving this goal and can serve as a valuable resource for designers, researchers, and policymakers in this field.
- [1512] arXiv:2401.14589 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive BiasAuthors: Yu He Ke , Rui Yang , Sui An Lie , Taylor Xin Yi Lim , Hairil Rizal Abdullah , Daniel Shu Wei Ting , Nan LiuComments: 22 pages, 3 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent system, we leveraged GPT-4 Turbo to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the initial and final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses. Findings: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80). The system demonstrated an ability to reevaluate and correct misconceptions, even in scenarios with misleading initial investigations. Interpretation: The LLM-driven multi-agent conversation system shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.
- [1513] arXiv:2401.14616 (cross-list from cs.CL) [ pdf , other ]
-
Title: Alternative Speech: Complementary Method to Counter-Narrative for Better DiscourseComments: Accepted for The First Workshop on Data-Centric AI (DCAI) at ICDM 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We introduce the concept of "Alternative Speech" as a new way to directly combat hate speech and complement the limitations of counter-narrative. An alternative speech provides practical alternatives to hate speech in real-world scenarios by offering speech-level corrections to speakers while considering the surrounding context and promoting speakers to reform. Further, an alternative speech can combat hate speech alongside counter-narratives, offering a useful tool to address social issues such as racial discrimination and gender inequality. We propose the new concept and provide detailed guidelines for constructing the necessary dataset. Through discussion, we demonstrate that combining alternative speech and counter-narrative can be a more effective strategy for combating hate speech by complementing specificity and guiding capacity of counter-narrative. This paper presents another perspective for dealing with hate speech, offering viable remedies to complement the constraints of current approaches to mitigating harmful bias.
- [1514] arXiv:2401.14617 (cross-list from cs.SE) [ pdf , other ]
-
Title: A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering ResearchAuthors: Sicong Cao , Xiaobing Sun , Ratnadira Widyasari , David Lo , Xiaoxue Wu , Lili Bo , Jiale Zhang , Bin Li , Wei Liu , Di Wu , Yixin ChenComments: submitted to ACM Computing Surveys. arXiv admin note: text overlap with arXiv:2202.06840 by other authorsSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 63 papers across 21 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a roadmap highlighting potential opportunities we deemed appropriate and important for future work.
- [1515] arXiv:2401.14630 (cross-list from cs.CL) [ pdf , other ]
-
Title: An Empirical Investigation of Domain Adaptation Ability for Chinese Spelling Check ModelsComments: ICASSP2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Chinese Spelling Check (CSC) is a meaningful task in the area of Natural Language Processing (NLP) which aims at detecting spelling errors in Chinese texts and then correcting these errors. However, CSC models are based on pretrained language models, which are trained on a general corpus. Consequently, their performance may drop when confronted with downstream tasks involving domain-specific terms. In this paper, we conduct a thorough evaluation about the domain adaption ability of various typical CSC models by building three new datasets encompassing rich domain-specific terms from the financial, medical, and legal domains. Then we conduct empirical investigations in the corresponding domain-specific test datasets to ascertain the cross-domain adaptation ability of several typical CSC models. We also test the performance of the popular large language model ChatGPT. As shown in our experiments, the performances of the CSC models drop significantly in the new domains.
- [1516] arXiv:2401.14694 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: TA-RNN: an Attention-based Time-aware Recurrent Neural Network Architecture for Electronic Health RecordsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Motivation: Electronic Health Records (EHR) represent a comprehensive resource of a patient's medical history. EHR are essential for utilizing advanced technologies such as deep learning (DL), enabling healthcare providers to analyze extensive data, extract valuable insights, and make precise and data-driven clinical decisions. DL methods such as Recurrent Neural Networks (RNN) have been utilized to analyze EHR to model disease progression and predict diagnosis. However, these methods do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. In this study, we propose two interpretable DL architectures based on RNN, namely Time-Aware RNN (TA-RNN) and TA-RNN-Autoencoder (TA-RNN-AE) to predict patient's clinical outcome in EHR at next visit and multiple visits ahead, respectively. To mitigate the impact of irregular time intervals, we propose incorporating time embedding of the elapsed times between visits. For interpretability, we propose employing a dual-level attention mechanism that operates between visits and features within each visit.
Results: The results of the experiments conducted on Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets indicated superior performance of proposed models for predicting Alzheimer's Disease (AD) compared to state-of-the-art and baseline approaches based on F2 and sensitivity. Additionally, TA-RNN showed superior performance on Medical Information Mart for Intensive Care (MIMIC-III) dataset for mortality prediction. In our ablation study, we observed enhanced predictive performance by incorporating time embedding and attention mechanisms. Finally, investigating attention weights helped identify influential visits and features in predictions. - [1517] arXiv:2401.14696 (cross-list from cs.LG) [ pdf , other ]
-
Title: Asymptotic Midpoint Mixup for Margin Balancing and Moderate BroadeningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In the feature space, the collapse between features invokes critical problems in representation learning by remaining the features undistinguished. Interpolation-based augmentation methods such as mixup have shown their effectiveness in relieving the collapse problem between different classes, called inter-class collapse. However, intra-class collapse raised in coarse-to-fine transfer learning has not been discussed in the augmentation approach. To address them, we propose a better feature augmentation method, asymptotic midpoint mixup. The method generates augmented features by interpolation but gradually moves them toward the midpoint of inter-class feature pairs. As a result, the method induces two effects: 1) balancing the margin for all classes and 2) only moderately broadening the margin until it holds maximal confidence. We empirically analyze the collapse effects by measuring alignment and uniformity with visualizing representations. Then, we validate the intra-class collapse effects in coarse-to-fine transfer learning and the inter-class collapse effects in imbalanced learning on long-tailed datasets. In both tasks, our method shows better performance than other augmentation methods.
- [1518] arXiv:2401.14698 (cross-list from cs.CL) [ pdf , other ]
-
Title: Under the Surface: Tracking the Artifactuality of LLM-Generated DataAuthors: Debarati Das , Karin De Langis , Anna Martin-Boyle , Jaehyung Kim , Minhwa Lee , Zae Myung Kim , Shirley Anugrah Hayati , Risako Owan , Bin Hu , Ritik Parkar , Ryan Koo , Jonginn Park , Aahan Tyagi , Libby Ferland , Sanjali Roy , Vincent Liu , Dongyeop KangComments: Core Authors: Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa Lee and Zae Myung Kim | Project lead : Debarati Das | PI : Dongyeop KangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant concerns about the quality and diversity of the artificial data incorporated into training cycles, leading to an artificial data ecosystem. To the best of our knowledge, this is the first study to aggregate various types of LLM-generated text data, from more tightly constrained data like "task labels" to more lightly constrained "free-form text". We then stress test the quality and implications of LLM-generated artificial data, comparing it with human data across various existing benchmarks. Despite artificial data's capability to match human performance, this paper reveals significant hidden disparities, especially in complex tasks where LLMs often miss the nuanced understanding of intrinsic human-generated content. This study critically examines diverse LLM-generated data and emphasizes the need for ethical practices in data creation and when using LLMs. It highlights the LLMs' shortcomings in replicating human traits and behaviors, underscoring the importance of addressing biases and artifacts produced in LLM-generated content for future research and development. All data and code are available on our project page.
- [1519] arXiv:2401.14702 (cross-list from cs.LG) [ pdf , other ]
-
Title: FairSample: Training Fair and Accurate Graph Convolutional Neural Networks EfficientlyComments: Accepted by TKDE 2023Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Fairness in Graph Convolutional Neural Networks (GCNs) becomes a more and more important concern as GCNs are adopted in many crucial applications. Societal biases against sensitive groups may exist in many real world graphs. GCNs trained on those graphs may be vulnerable to being affected by such biases. In this paper, we adopt the well-known fairness notion of demographic parity and tackle the challenge of training fair and accurate GCNs efficiently. We present an in-depth analysis on how graph structure bias, node attribute bias, and model parameters may affect the demographic parity of GCNs. Our insights lead to FairSample, a framework that jointly mitigates the three types of biases. We employ two intuitive strategies to rectify graph structures. First, we inject edges across nodes that are in different sensitive groups but similar in node features. Second, to enhance model fairness and retain model quality, we develop a learnable neighbor sampling policy using reinforcement learning. To address the bias in node features and model parameters, FairSample is complemented by a regularization objective to optimize fairness.
- [1520] arXiv:2401.14707 (cross-list from cs.CV) [ pdf , other ]
-
Title: Mitigating Feature Gap for Adversarial Robustness by Feature DisentanglementComments: 8 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Deep neural networks are vulnerable to adversarial samples. Adversarial fine-tuning methods aim to enhance adversarial robustness through fine-tuning the naturally pre-trained model in an adversarial training manner. However, we identify that some latent features of adversarial samples are confused by adversarial perturbation and lead to an unexpectedly increasing gap between features in the last hidden layer of natural and adversarial samples. To address this issue, we propose a disentanglement-based approach to explicitly model and further remove the latent features that cause the feature gap. Specifically, we introduce a feature disentangler to separate out the latent features from the features of the adversarial samples, thereby boosting robustness by eliminating the latent features. Besides, we align features in the pre-trained model with features of adversarial samples in the fine-tuned model, to further benefit from the features from natural samples without confusion. Empirical evaluations on three benchmark datasets demonstrate that our approach surpasses existing adversarial fine-tuning methods and adversarial training baselines.
- [1521] arXiv:2401.14717 (cross-list from cs.CL) [ pdf , other ]
-
Title: Turn-taking and Backchannel Prediction with Acoustic and Large Language Model FusionAuthors: Jinhan Wang , Long Chen , Aparna Khare , Anirudh Raju , Pranav Dheram , Di He , Minhua Wu , Andreas Stolcke , Venkatesh RavichandranComments: To appear in IEEE ICASSP 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demonstrate that our approach consistently outperforms the baseline models with single modality. We also develop a novel multi-task instruction fine-tuning strategy to further benefit from LLM-encoded knowledge for understanding the tasks and conversational contexts, leading to additional improvements. Our approach demonstrates the potential of combined LLMs and acoustic models for a more natural and conversational interaction between humans and speech-enabled AI agents.
- [1522] arXiv:2401.14749 (cross-list from cs.IT) [ pdf , other ]
-
Title: Topology-Aware Exploration of Energy-Based Models Equilibrium: Toric QC-LDPC Codes and Hyperbolic MET QC-LDPC CodesComments: 16 pages, 29 figures. arXiv admin note: text overlap with arXiv:2307.15778Subjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: This paper presents a method for achieving equilibrium in the ISING Hamiltonian when confronted with unevenly distributed charges on an irregular grid. Employing (Multi-Edge) QC-LDPC codes and the Boltzmann machine, our approach involves dimensionally expanding the system, substituting charges with circulants, and representing distances through circulant shifts. This results in a systematic mapping of the charge system onto a space, transforming the irregular grid into a uniform configuration, applicable to Torical and Circular Hyperboloid Topologies. The paper covers fundamental definitions and notations related to QC-LDPC Codes, Multi-Edge QC-LDPC codes, and the Boltzmann machine. It explores the marginalization problem in code on the graph probabilistic models for evaluating the partition function, encompassing exact and approximate estimation techniques. Rigorous proof is provided for the attainability of equilibrium states for the Boltzmann machine under Torical and Circular Hyperboloid, paving the way for the application of our methodology. Practical applications of our approach are investigated in Finite Geometry QC-LDPC Codes, specifically in Material Science. The paper further explores its effectiveness in the realm of Natural Language Processing Transformer Deep Neural Networks, examining Generalized Repeat Accumulate Codes, Spatially-Coupled and Cage-Graph QC-LDPC Codes. The versatile and impactful nature of our topology-aware hardware-efficient quasi-cycle codes equilibrium method is showcased across diverse scientific domains without the use of specific section delineations.
- [1523] arXiv:2401.14777 (cross-list from cs.CL) [ pdf , other ]
-
Title: Large Language Model Adaptation for Financial Sentiment AnalysisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Natural language processing (NLP) has recently gained relevance within financial institutions by providing highly valuable insights into companies and markets' financial documents. However, the landscape of the financial domain presents extra challenges for NLP, due to the complexity of the texts and the use of specific terminology. Generalist language models tend to fall short in tasks specifically tailored for finance, even when using large language models (LLMs) with great natural language understanding and generative capabilities. This paper presents a study on LLM adaptation methods targeted at the financial domain and with high emphasis on financial sentiment analysis. To this purpose, two foundation models with less than 1.5B parameters have been adapted using a wide range of strategies. We show that through careful fine-tuning on both financial documents and instructions, these foundation models can be adapted to the target domain. Moreover, we observe that small LLMs have comparable performance to larger scale models, while being more efficient in terms of parameters and data. In addition to the models, we show how to generate artificial instructions through LLMs to augment the number of samples of the instruction dataset.
- [1524] arXiv:2401.14831 (cross-list from cs.RO) [ pdf , other ]
-
Title: The Machine Vision Iceberg Explained: Advancing Dynamic Testing by Considering Holistic Environmental CircumstancesComments: Submitted at IEEE IV 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE); Image and Video Processing (eess.IV)
Abstract: Are we heading for an iceberg with the current testing of machine vision? This work delves into the landscape of Machine Vision (MV) testing, which is heavily required in Highly Automated Driving (HAD) systems. Utilizing the metaphorical notion of navigating towards an iceberg, we discuss the potential shortcomings concealed within current testing strategies. We emphasize the urgent need for a deeper understanding of how to deal with the opaque functions of MV in development processes. As overlooked considerations can cost lives. Our main contribution is the hierarchical level model, which we call Granularity Grades. The model encourages a refined exploration of the multi-scaled depths of understanding about the circumstances of environments in which MV is intended to operate. This model aims to provide a holistic overview of all entities that may impact MV functions, ranging from relations of individual entities like object attributes to entire environmental scenes. The application of our model delivers a structured exploration of entities in a specific domain, their relationships and assigning results of a MV-under-test to construct an entity-relationship graph. Through clustering patterns of relations in the graph general MV deficits are arguable. In Summary, our work contributes to a more nuanced and systematized identification of deficits of a MV test object in correlation to holistic circumstances in HAD operating domains.
- [1525] arXiv:2401.14856 (cross-list from cs.CV) [ pdf , other ]
-
Title: Memory-Inspired Temporal Prompt Interaction for Text-Image ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, large-scale pre-trained multimodal models (LMM) generally emerge to integrate the vision and language modalities, achieving considerable success in various natural language processing and computer vision tasks. The growing size of LMMs, however, results in a significant computational cost for fine-tuning these models for downstream tasks. Hence, prompt-based interaction strategy is studied to align modalities more efficiently. In this contex, we propose a novel prompt-based multimodal interaction strategy inspired by human memory strategy, namely Memory-Inspired Temporal Prompt Interaction (MITP). Our proposed method involves in two stages as in human memory strategy: the acquiring stage, and the consolidation and activation stage. We utilize temporal prompts on intermediate layers to imitate the acquiring stage, leverage similarity-based prompt interaction to imitate memory consolidation, and employ prompt generation strategy to imitate memory activation. The main strength of our paper is that we interact the prompt vectors on intermediate layers to leverage sufficient information exchange between modalities, with compressed trainable parameters and memory usage. We achieve competitive results on several datasets with relatively small memory usage and 2.0M of trainable parameters (about 1% of the pre-trained foundation model).
- [1526] arXiv:2401.14876 (cross-list from cs.LG) [ pdf , other ]
-
Title: Cross-Space Adaptive Filter: Integrating Graph Topology and Node Attributes for Alleviating the Over-smoothing ProblemComments: Accepted to WWW 2024. V2: update the results on GCN-BC based on our rebuttal on OpenReview. Our code is available at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The vanilla Graph Convolutional Network (GCN) uses a low-pass filter to extract low-frequency signals from graph topology, which may lead to the over-smoothing problem when GCN goes deep. To this end, various methods have been proposed to create an adaptive filter by incorporating an extra filter (e.g., a high-pass filter) extracted from the graph topology. However, these methods heavily rely on topological information and ignore the node attribute space, which severely sacrifices the expressive power of the deep GCNs, especially when dealing with disassortative graphs. In this paper, we propose a cross-space adaptive filter, called CSF, to produce the adaptive-frequency information extracted from both the topology and attribute spaces. Specifically, we first derive a tailored attribute-based high-pass filter that can be interpreted theoretically as a minimizer for semi-supervised kernel ridge regression. Then, we cast the topology-based low-pass filter as a Mercer's kernel within the context of GCNs. This serves as a foundation for combining it with the attribute-based filter to capture the adaptive-frequency information. Finally, we derive the cross-space filter via an effective multiple-kernel learning strategy, which unifies the attribute-based high-pass filter and the topology-based low-pass filter. This helps to address the over-smoothing problem while maintaining effectiveness. Extensive experiments demonstrate that CSF not only successfully alleviates the over-smoothing problem but also promotes the effectiveness of the node classification task.
- [1527] arXiv:2401.14915 (cross-list from cs.HC) [ pdf , other ]
-
Title: Charting the Future of AI in Project-Based Learning: A Co-Design Exploration with StudentsAuthors: Chengbo Zheng , Kangyu Yuan , Bingcan Guo , Reza Hadi Mogavi , Zhenhui Peng , Shuai Ma , Xiaojuan MaComments: Conditionally accepted by CHI '24Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: The increasing use of Artificial Intelligence (AI) by students in learning presents new challenges for assessing their learning outcomes in project-based learning (PBL). This paper introduces a co-design study to explore the potential of students' AI usage data as a novel material for PBL assessment. We conducted workshops with 18 college students, encouraging them to speculate an alternative world where they could freely employ AI in PBL while needing to report this process to assess their skills and contributions. Our workshops yielded various scenarios of students' use of AI in PBL and ways of analyzing these uses grounded by students' vision of education goal transformation. We also found students with different attitudes toward AI exhibited distinct preferences in how to analyze and understand the use of AI. Based on these findings, we discuss future research opportunities on student-AI interactions and understanding AI-enhanced learning.
- [1528] arXiv:2401.14931 (cross-list from cs.CL) [ pdf , other ]
-
Title: Do LLMs Dream of Ontologies?Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) have recently revolutionized automated text understanding and generation. The performance of these models relies on the high number of parameters of the underlying neural architectures, which allows LLMs to memorize part of the vast quantity of data seen during the training. This paper investigates whether and to what extent general-purpose pre-trained LLMs have memorized information from known ontologies. Our results show that LLMs partially know ontologies: they can, and do indeed, memorize concepts from ontologies mentioned in the text, but the level of memorization of their concepts seems to vary proportionally to their popularity on the Web, the primary source of their training material. We additionally propose new metrics to estimate the degree of memorization of ontological information in LLMs by measuring the consistency of the output produced across different prompt repetitions, query languages, and degrees of determinism.
- [1529] arXiv:2401.14948 (cross-list from cs.LG) [ pdf , other ]
-
Title: Conserve-Update-Revise to Cure Generalization and Robustness Trade-off in Adversarial TrainingComments: Accepted as a conference paper at ICLR 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Adversarial training improves the robustness of neural networks against adversarial attacks, albeit at the expense of the trade-off between standard and robust generalization. To unveil the underlying factors driving this phenomenon, we examine the layer-wise learning capabilities of neural networks during the transition from a standard to an adversarial setting. Our empirical findings demonstrate that selectively updating specific layers while preserving others can substantially enhance the network's learning capacity. We therefore propose CURE, a novel training framework that leverages a gradient prominence criterion to perform selective conservation, updating, and revision of weights. Importantly, CURE is designed to be dataset- and architecture-agnostic, ensuring its applicability across various scenarios. It effectively tackles both memorization and overfitting issues, thus enhancing the trade-off between robustness and generalization and additionally, this training approach also aids in mitigating "robust overfitting". Furthermore, our study provides valuable insights into the mechanisms of selective adversarial training and offers a promising avenue for future research.
- [1530] arXiv:2401.14953 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learning Universal PredictorsAuthors: Jordi Grau-Moya , Tim Genewein , Marcus Hutter , Laurent Orseau , Grégoire Delétang , Elliot Catt , Anian Ruoss , Li Kevin Wenliang , Christopher Mattern , Matthew Aitchison , Joel VenessComments: 32 pages, 11 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of meta-learning? In this work, we explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neural networks via leveraging meta-learning to its limits. We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns. We provide theoretical analysis of the UTM data generation processes and meta-training protocols. We conduct comprehensive experiments with neural architectures (e.g. LSTMs, Transformers) and algorithmic data generators of varying complexity and universality. Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies.
- [1531] arXiv:2401.14968 (cross-list from cs.DC) [ pdf , ps , other ]
-
Title: Atmosphere: Context and situational-aware collaborative IoT architecture for edge-fog-cloud computingJournal-ref: Comput. Stand. Interfaces 79: 103550 (2022)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: The Internet of Things (IoT) has grown significantly in popularity, accompanied by increased capacity and lower cost of communications, and overwhelming development of technologies. At the same time, big data and real-time data analysis have taken on great importance and have been accompanied by unprecedented interest in sharing data among citizens, public administrations and other organisms, giving rise to what is known as the Collaborative Internet of Things. This growth in data and infrastructure must be accompanied by a software architecture that allows its exploitation. Although there are various proposals focused on the exploitation of the IoT at edge, fog and/or cloud levels, it is not easy to find a software solution that exploits the three tiers together, taking maximum advantage not only of the analysis of contextual and situational data at each tier, but also of two-way communications between adjacent ones. In this paper, we propose an architecture that solves these deficiencies by proposing novel technologies which are appropriate for managing the resources of each tier: edge, fog and cloud. In addition, the fact that two-way communications along the three tiers of the architecture is allowed considerably enriches the contextual and situational information in each layer, and substantially assists decision making in real time. The paper illustrates the proposed software architecture through a case study of respiratory disease surveillance in hospitals. As a result, the proposed architecture permits efficient communications between the different tiers responding to the needs of these types of IoT scenarios.
- [1532] arXiv:2401.15006 (cross-list from cs.CL) [ pdf , other ]
-
Title: Airavata: Introducing Hindi Instruction-tuned LLMAuthors: Jay Gala , Thanmay Jayakumar , Jaavid Aktar Husain , Aswanth Kumar M , Mohammed Safi Ur Rahman Khan , Diptesh Kanojia , Ratish Puduppully , Mitesh M. Khapra , Raj Dabre , Rudra Murthy , Anoop KunchukuttanComments: Work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additionally, we present evaluation benchmarks and a framework for assessing LLM performance across tasks in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages. You can access all artifacts at this https URL .
- [1533] arXiv:2401.15042 (cross-list from cs.CL) [ pdf , other ]
-
Title: PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language ModelsAuthors: Haochen Tan , Zhijiang Guo , Zhan Shi , Lu Xu , Zhili Liu , Yunlong Feng , Xiaoguang Li , Yasheng Wang , Lifeng Shang , Qun Liu , Linqi SongSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have exhibited remarkable success in long-form context comprehension tasks. However, their capacity to generate long contents, such as reports and articles, remains insufficiently explored. Current benchmarks do not adequately assess LLMs' ability to produce informative and comprehensive content, necessitating a more rigorous evaluation approach. In this study, we introduce \textsc{ProxyQA}, a framework for evaluating long-form text generation, comprising in-depth human-curated \textit{meta-questions} spanning various domains. Each meta-question contains corresponding \textit{proxy-questions} with annotated answers. LLMs are prompted to generate extensive content in response to these meta-questions. Utilizing an evaluator and incorporating generated content as background context, \textsc{ProxyQA} evaluates the quality of generated content based on the evaluator's performance in answering the \textit{proxy-questions}. We examine multiple LLMs, emphasizing \textsc{ProxyQA}'s demanding nature as a high-quality assessment tool. Human evaluation demonstrates that evaluating through \textit{proxy-questions} is a highly self-consistent and human-criteria-correlated validation method. The dataset and leaderboard will be available at \url{ this https URL }.
- [1534] arXiv:2401.15043 (cross-list from cs.CL) [ pdf , other ]
-
Title: Health Text Simplification: An Annotated Corpus for Digestive Cancer Education and Novel Strategies for Reinforcement LearningAuthors: Md Mushfiqur Rahman , Mohammad Sabik Irbaz , Kai North , Michelle S. Williams , Marcos Zampieri , Kevin LybargerSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Objective: The reading level of health educational materials significantly influences the understandability and accessibility of the information, particularly for minoritized populations. Many patient educational resources surpass the reading level and complexity of widely accepted standards. There is a critical need for high-performing text simplification models in health information to enhance dissemination and literacy. This need is particularly acute in cancer education, where effective prevention and screening education can substantially reduce morbidity and mortality.
Methods: We introduce Simplified Digestive Cancer (SimpleDC), a parallel corpus of cancer education materials tailored for health text simplification research, comprising educational content from the American Cancer Society, Centers for Disease Control and Prevention, and National Cancer Institute. Utilizing SimpleDC alongside the existing Med-EASi corpus, we explore Large Language Model (LLM)-based simplification methods, including fine-tuning, reinforcement learning (RL), reinforcement learning with human feedback (RLHF), domain adaptation, and prompt-based approaches. Our experimentation encompasses Llama 2 and GPT-4. A novel RLHF reward function is introduced, featuring a lightweight model adept at distinguishing between original and simplified texts, thereby enhancing the model's effectiveness with unlabeled data.
Results: Fine-tuned Llama 2 models demonstrated high performance across various metrics. Our innovative RLHF reward function surpassed existing RL text simplification reward functions in effectiveness. The results underscore that RL/RLHF can augment fine-tuning, facilitating model training on unlabeled text and improving performance. - [1535] arXiv:2401.15075 (cross-list from cs.CV) [ pdf , other ]
-
Title: Annotated Hands for Generative ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: Generative models such as GANs and diffusion models have demonstrated impressive image generation capabilities. Despite these successes, these systems are surprisingly poor at creating images with hands. We propose a novel training framework for generative models that substantially improves the ability of such systems to create hand images. Our approach is to augment the training images with three additional channels that provide annotations to hands in the image. These annotations provide additional structure that coax the generative model to produce higher quality hand images. We demonstrate this approach on two different generative models: a generative adversarial network and a diffusion model. We demonstrate our method both on a new synthetic dataset of hand images and also on real photographs that contain hands. We measure the improved quality of the generated hands through higher confidence in finger joint identification using an off-the-shelf hand detector.
- [1536] arXiv:2401.15098 (cross-list from cs.LG) [ pdf , other ]
-
Title: Hierarchical Continual Reinforcement Learning via Large Language ModelSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The ability to learn continuously in dynamic environments is a crucial requirement for reinforcement learning (RL) agents applying in the real world. Despite the progress in continual reinforcement learning (CRL), existing methods often suffer from insufficient knowledge transfer, particularly when the tasks are diverse. To address this challenge, we propose a new framework, Hierarchical Continual reinforcement learning via large language model (Hi-Core), designed to facilitate the transfer of high-level knowledge. Hi-Core orchestrates a twolayer structure: high-level policy formulation by a large language model (LLM), which represents agenerates a sequence of goals, and low-level policy learning that closely aligns with goal-oriented RL practices, producing the agent's actions in response to the goals set forth. The framework employs feedback to iteratively adjust and verify highlevel policies, storing them along with low-level policies within a skill library. When encountering a new task, Hi-Core retrieves relevant experience from this library to help to learning. Through experiments on Minigrid, Hi-Core has demonstrated its effectiveness in handling diverse CRL tasks, which outperforms popular baselines.
- [1537] arXiv:2401.15103 (cross-list from cs.LG) [ pdf , other ]
-
Title: PruneSymNet: A Symbolic Neural Network and Pruning Algorithm for Symbolic RegressionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Symbolic regression aims to derive interpretable symbolic expressions from data in order to better understand and interpret data. %which plays an important role in knowledge discovery and interpretable machine learning.
In this study, a symbolic network called PruneSymNet is proposed for symbolic regression. This is a novel neural network whose activation function consists of common elementary functions and operators. The whole network is differentiable and can be trained by gradient descent method. Each subnetwork in the network corresponds to an expression, and our goal is to extract such subnetworks to get the desired symbolic expression.
Therefore, a greedy pruning algorithm is proposed to prune the network into a subnetwork while ensuring the accuracy of data fitting. The proposed greedy pruning algorithm preserves the edge with the least loss in each pruning, but greedy algorithm often can not get the optimal solution. In order to alleviate this problem, we combine beam search during pruning to obtain multiple candidate expressions each time, and finally select the expression with the smallest loss as the final result. It was tested on the public data set and compared with the current popular algorithms. The results showed that the proposed algorithm had better accuracy. - [1538] arXiv:2401.15106 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Decision Theoretic Foundations for Experiments Evaluating Human DecisionsSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Decision-making with information displays is a key focus of research in areas like explainable AI, human-AI teaming, and data visualization. However, what constitutes a decision problem, and what is required for an experiment to be capable of concluding that human decisions are flawed in some way, remain open to speculation. We present a widely applicable definition of a decision problem synthesized from statistical decision theory and information economics. We argue that to attribute loss in human performance to forms of bias, an experiment must provide participants with the information that a rational agent would need to identify the normative decision. We evaluate the extent to which recent evaluations of decision-making from the literature on AI-assisted decisions achieve this criteria. We find that only 10 (26\%) of 39 studies that claim to identify biased behavior present participants with sufficient information to characterize their behavior as deviating from good decision-making in at least one treatment condition. We motivate the value of studying well-defined decision problems by describing a characterization of performance losses they allow us to conceive. In contrast, the ambiguities of a poorly communicated decision problem preclude normative interpretation. We conclude with recommendations for practice.
- [1539] arXiv:2401.15108 (cross-list from cs.LG) [ pdf , other ]
-
Title: Multi-agent Deep Reinforcement Learning for Dynamic Pricing by Fast-charging Electric Vehicle Hubs in ccompetitionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); General Economics (econ.GN); Systems and Control (eess.SY)
Abstract: Fast-charging hubs for electric vehicles will soon become part of the newly built infrastructure for transportation electrification across the world. These hubs are expected to host many DC fast-charging stations and will admit EVs only for charging. Like the gasoline refueling stations, fast-charging hubs in a neighborhood will dynamically vary their prices to compete for the same pool of EV owners. These hubs will interact with the electric power network by making purchase commitments for a significant part of their power needs in the day-ahead (DA) electricity market and meeting the difference from the real-time (RT) market. Hubs may have supplemental battery storage systems (BSS), which they will use for arbitrage. In this paper, we develop a two-step data-driven dynamic pricing methodology for hubs in price competition. We first obtain the DA commitment by solving a stochastic DA commitment model. Thereafter we obtain the hub pricing strategies by modeling the game as a competitive Markov decision process (CMDP) and solving it using a multi-agent deep reinforcement learning (MADRL) approach. We develop a numerical case study for a pricing game between two charging hubs. We solve the case study with our methodology by using combinations of two different DRL algorithms, DQN and SAC, and two different neural networks (NN) architectures, a feed-forward (FF) neural network, and a multi-head attention (MHA) neural network. We construct a measure of collusion (index) using the hub profits. A value of zero for this index indicates no collusion (perfect competition) and a value of one indicates full collusion (monopolistic behavior). Our results show that the collusion index varies approximately between 0.14 and 0.45 depending on the combinations of the algorithms and the architectures chosen by the hubs.
- [1540] arXiv:2401.15109 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Towards Collective Superintelligence: Amplifying Group IQ using Conversational SwarmsSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Swarm Intelligence (SI) is a natural phenomenon that enables biological groups to amplify their combined intellect by forming real-time systems. Artificial Swarm Intelligence (or Swarm AI) is a technology that enables networked human groups to amplify their combined intelligence by forming similar systems. In the past, swarm-based methods were constrained to narrowly defined tasks like probabilistic forecasting and multiple-choice decision making. A new technology called Conversational Swarm Intelligence (CSI) was developed in 2023 that amplifies the decision-making accuracy of networked human groups through natural conversational deliberations. The current study evaluated the ability of real-time groups using a CSI platform to take a common IQ test known as Raven's Advanced Progressive Matrices (RAPM). First, a baseline group of participants took the Raven's IQ test by traditional survey. This group averaged 45.6% correct. Then, groups of approximately 35 individuals answered IQ test questions together using a CSI platform called Thinkscape. These groups averaged 80.5% correct. This places the CSI groups in the 97th percentile of IQ test-takers and corresponds to an effective IQ increase of 28 points (p<0.001). This is an encouraging result and suggests that CSI is a powerful method for enabling conversational collective intelligence in large, networked groups. In addition, because CSI is scalable across groups of potentially any size, this technology may provide a viable pathway to building a Collective Superintelligence.
- [1541] arXiv:2401.15118 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: GeoDecoder: Empowering Multimodal Map UnderstandingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents GeoDecoder, a dedicated multimodal model designed for processing geospatial information in maps. Built on the BeitGPT architecture, GeoDecoder incorporates specialized expert modules for image and text processing. On the image side, GeoDecoder utilizes GaoDe Amap as the underlying base map, which inherently encompasses essential details about road and building shapes, relative positions, and other attributes. Through the utilization of rendering techniques, the model seamlessly integrates external data and features such as symbol markers, drive trajectories, heatmaps, and user-defined markers, eliminating the need for extra feature engineering. The text module of GeoDecoder accepts various context texts and question prompts, generating text outputs in the style of GPT. Furthermore, the GPT-based model allows for the training and execution of multiple tasks within the same model in an end-to-end manner. To enhance map cognition and enable GeoDecoder to acquire knowledge about the distribution of geographic entities in Beijing, we devised eight fundamental geospatial tasks and conducted pretraining of the model using large-scale text-image samples. Subsequently, rapid fine-tuning was performed on three downstream tasks, resulting in significant performance improvements. The GeoDecoder model demonstrates a comprehensive understanding of map elements and their associated operations, enabling efficient and high-quality application of diverse geospatial tasks in different business scenarios.
- [1542] arXiv:2401.15119 (cross-list from cs.LG) [ pdf , other ]
-
Title: Interpreting Time Series Transformer Models and Sensitivity Analysis of Population Age Groups to COVID-19 InfectionsAuthors: Md Khairul Islam , Tyler Valentine , Timothy Joowon Sue , Ayush Karmacharya , Luke Neil Benham , Zhengguang Wang , Kingsley Kim , Judy FoxSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Populations and Evolution (q-bio.PE)
Abstract: Interpreting deep learning time series models is crucial in understanding the model's behavior and learning patterns from raw data for real-time decision-making. However, the complexity inherent in transformer-based time series models poses challenges in explaining the impact of individual features on predictions. In this study, we leverage recent local interpretation methods to interpret state-of-the-art time series models. To use real-world datasets, we collected three years of daily case data for 3,142 US counties. Firstly, we compare six transformer-based models and choose the best prediction model for COVID-19 infection. Using 13 input features from the last two weeks, we can predict the cases for the next two weeks. Secondly, we present an innovative way to evaluate the prediction sensitivity to 8 population age groups over highly dynamic multivariate infection data. Thirdly, we compare our proposed perturbation-based interpretation method with related work, including a total of eight local interpretation methods. Finally, we apply our framework to traffic and electricity datasets, demonstrating that our approach is generic and can be applied to other time-series domains.
- [1543] arXiv:2401.15120 (cross-list from cs.CV) [ pdf , other ]
-
Title: Incorporating simulated spatial context information improves the effectiveness of contrastive learning modelsSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Visual learning often occurs in a specific context, where an agent acquires skills through exploration and tracking of its location in a consistent environment. The historical spatial context of the agent provides a similarity signal for self-supervised contrastive learning. We present a unique approach, termed Environmental Spatial Similarity (ESS), that complements existing contrastive learning methods. Using images from simulated, photorealistic environments as an experimental setting, we demonstrate that ESS outperforms traditional instance discrimination approaches. Moreover, sampling additional data from the same environment substantially improves accuracy and provides new augmentations. ESS allows remarkable proficiency in room classification and spatial prediction tasks, especially in unfamiliar environments. This learning paradigm has the potential to enable rapid visual learning in agents operating in new environments with unique visual characteristics. Potentially transformative applications span from robotics to space exploration. Our proof of concept demonstrates improved efficiency over methods that rely on extensive, disconnected datasets.
- [1544] arXiv:2401.15121 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Expressive Power of ReLU and Step Networks under Floating-Point OperationsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within a small error. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent; these results are applicable to many popular floating-point formats such as those defined in the IEEE 754 standard (e.g., 32-bit single-precision format) and bfloat16.
- [1545] arXiv:2401.15122 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Multi-Grained Symmetric Differential Equation Model for Learning Protein-Ligand Binding DynamicsAuthors: Shengchao Liu , Weitao Du , Yanjing Li , Zhuoxinran Li , Vignesh Bhethanabotla , Nakul Rampal , Omar Yaghi , Christian Borgs , Anima Anandkumar , Hongyu Guo , Jennifer ChayesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Biomolecules (q-bio.BM); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Abstract: In drug discovery, molecular dynamics (MD) simulation for protein-ligand binding provides a powerful tool for predicting binding affinities, estimating transport properties, and exploring pocket sites. There has been a long history of improving the efficiency of MD simulations through better numerical methods and, more recently, by utilizing machine learning (ML) methods. Yet, challenges remain, such as accurate modeling of extended-timescale simulations. To address this issue, we propose NeuralMD, the first ML surrogate that can facilitate numerical MD and provide accurate simulations in protein-ligand binding. We propose a principled approach that incorporates a novel physics-informed multi-grained group symmetric framework. Specifically, we propose (1) a BindingNet model that satisfies group symmetry using vector frames and captures the multi-level protein-ligand interactions, and (2) an augmented neural differential equation solver that learns the trajectory under Newtonian mechanics. For the experiment, we design ten single-trajectory and three multi-trajectory binding simulation tasks. We show the efficiency and effectiveness of NeuralMD, with a 2000$\times$ speedup over standard numerical MD simulation and outperforming all other ML approaches by up to 80% under the stability metric. We further qualitatively show that NeuralMD reaches more stable binding predictions compared to other machine learning methods.
- [1546] arXiv:2401.15123 (cross-list from cs.LG) [ pdf , other ]
-
Title: Large Language Model Guided Knowledge Distillation for Time Series Anomaly DetectionComments: 12 pages, 5 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network's feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5\% in the UCR dataset.
- [1547] arXiv:2401.15124 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Sensor-Based Data Acquisition via Ubiquitous Device to Detect Muscle Strength Training ActivitiesComments: 9 pages, 4 figures, AHFE International Conference on Human Factors in Design, Engineering, and ComputingSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Maintaining a high quality of life through physical activities (PA) to prevent health decline is crucial. However, the relationship between individuals health status, PA preferences, and motion factors is complex. PA discussions consistently show a positive correlation with healthy aging experiences, but no explicit relation to specific types of musculoskeletal exercises. Taking advantage of the increasingly widespread existence of smartphones, especially in Indonesia, this research utilizes embedded sensors for Human Activity Recognition (HAR). Based on 25 participants data, performing nine types of selected motion, this study has successfully identified important sensor attributes that play important roles in the right and left hands for muscle strength motions as the basis for developing machine learning models with the LSTM algorithm.
- [1548] arXiv:2401.15132 (cross-list from cs.HC) [ pdf , other ]
-
Title: On the Emergence of Symmetrical RealityAuthors: Zhenliang Zhang , Zeyu Zhang , Ziyuan Jiao , Yao Su , Hangxin Liu , Wei Wang , Song-Chun ZhuComments: IEEE VR 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Artificial intelligence (AI) has revolutionized human cognitive abilities and facilitated the development of new AI entities capable of interacting with humans in both physical and virtual environments. Despite the existence of virtual reality, mixed reality, and augmented reality for several years, integrating these technical fields remains a formidable challenge due to their disparate application directions. The advent of AI agents, capable of autonomous perception and action, further compounds this issue by exposing the limitations of traditional human-centered research approaches. It is imperative to establish a comprehensive framework that accommodates the dual perceptual centers of humans and AI agents in both physical and virtual worlds. In this paper, we introduce the symmetrical reality framework, which offers a unified representation encompassing various forms of physical-virtual amalgamations. This framework enables researchers to better comprehend how AI agents can collaborate with humans and how distinct technical pathways of physical-virtual integration can be consolidated from a broader perspective. We then delve into the coexistence of humans and AI, demonstrating a prototype system that exemplifies the operation of symmetrical reality systems for specific tasks, such as pouring water. Subsequently, we propose an instance of an AI-driven active assistance service that illustrates the potential applications of symmetrical reality. This paper aims to offer beneficial perspectives and guidance for researchers and practitioners in different fields, thus contributing to the ongoing research about human-AI coexistence in both physical and virtual environments.
- [1549] arXiv:2401.15170 (cross-list from cs.CL) [ pdf , other ]
-
Title: Scalable Qualitative Coding with LLMs: Chain-of-Thought Reasoning Matches Human Performance in Some Hermeneutic TasksAuthors: Zackary Okun DunivinSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Qualitative coding, or content analysis, extracts meaning from text to discern quantitative patterns across a corpus of texts. Recently, advances in the interpretive abilities of large language models (LLMs) offer potential for automating the coding process (applying category labels to texts), thereby enabling human researchers to concentrate on more creative research aspects, while delegating these interpretive tasks to AI. Our case study comprises a set of socio-historical codes on dense, paragraph-long passages representative of a humanistic study. We show that GPT-4 is capable of human-equivalent interpretations, whereas GPT-3.5 is not. Compared to our human-derived gold standard, GPT-4 delivers excellent intercoder reliability (Cohen's $\kappa \geq 0.79$) for 3 of 9 codes, and substantial reliability ($\kappa \geq 0.6$) for 8 of 9 codes. In contrast, GPT-3.5 greatly underperforms for all codes ($mean(\kappa) = 0.34$; $max(\kappa) = 0.55$). Importantly, we find that coding fidelity improves considerably when the LLM is prompted to give rationale justifying its coding decisions (chain-of-thought reasoning). We present these and other findings along with a set of best practices for adapting traditional codebooks for LLMs. Our results indicate that for certain codebooks, state-of-the-art LLMs are already adept at large-scale content analysis. Furthermore, they suggest the next generation of models will likely render AI coding a viable option for a majority of codebooks.
- [1550] arXiv:2401.15199 (cross-list from cs.LG) [ pdf , other ]
-
Title: SCANIA Component X Dataset: A Real-World Multivariate Time Series Dataset for Predictive MaintenanceComments: 10 pages, 8 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents a description of a real-world, multivariate time series dataset collected from an anonymized engine component (called Component X) of a fleet of trucks from SCANIA, Sweden. This dataset includes diverse variables capturing detailed operational data, repair records, and specifications of trucks while maintaining confidentiality by anonymization. It is well-suited for a range of machine learning applications, such as classification, regression, survival analysis, and anomaly detection, particularly when applied to predictive maintenance scenarios. The large population size and variety of features in the format of histograms and numerical counters, along with the inclusion of temporal information, make this real-world dataset unique in the field. The objective of releasing this dataset is to give a broad range of researchers the possibility of working with real-world data from an internationally well-known company and introduce a standard benchmark to the predictive maintenance field, fostering reproducible research.
- [1551] arXiv:2401.15210 (cross-list from cs.DB) [ pdf , ps , other ]
-
Title: Roq: Robust Query Optimization Based on a Risk-aware Learned Cost ModelComments: 13 pages, 9 figures, submitted to SIGMOD 2024Subjects: Databases (cs.DB) ; Artificial Intelligence (cs.AI)
Abstract: Query optimizers in relational database management systems (RDBMSs) search for execution plans expected to be optimal for a given queries. They use parameter estimates, often inaccurate, and make assumptions that may not hold in practice. Consequently, they may select execution plans that are suboptimal at runtime, when these estimates and assumptions are not valid, which may result in poor query performance. Therefore, query optimizers do not sufficiently support robust query optimization. Recent years have seen a surge of interest in using machine learning (ML) to improve efficiency of data systems and reduce their maintenance overheads, with promising results obtained in the area of query optimization in particular. In this paper, inspired by these advancements, and based on several years of experience of IBM Db2 in this journey, we propose Robust Optimization of Queries, (Roq), a holistic framework that enables robust query optimization based on a risk-aware learning approach. Roq includes a novel formalization of the notion of robustness in the context of query optimization and a principled approach for its quantification and measurement based on approximate probabilistic ML. It also includes novel strategies and algorithms for query plan evaluation and selection. Roq also includes a novel learned cost model that is designed to predict query execution cost and the associated risks and performs query optimization accordingly. We demonstrate experimentally that Roq provides significant improvements to robust query optimization compared to the state-of-the-art.
- [1552] arXiv:2401.15222 (cross-list from cs.CL) [ pdf , other ]
-
Title: Transfer Learning for the Prediction of Entity Modifiers in Clinical Text: Application to Opioid Use Disorder Case DetectionAuthors: Abdullateef I. Almudaifer , Whitney Covington , JaMor Hairston , Zachary Deitch , Ankit Anand , Caleb M. Carroll , Estera Crisan , William Bradford , Lauren Walter , Eaton Ellen , Sue S. Feldman , John D. OsborneComments: 18 pages, 2 figures, 6 tables. To be submitted to the Journal of Biomedical SemanticsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Background: The semantics of entities extracted from a clinical text can be dramatically altered by modifiers, including entity negation, uncertainty, conditionality, severity, and subject. Existing models for determining modifiers of clinical entities involve regular expression or features weights that are trained independently for each modifier.
Methods: We develop and evaluate a multi-task transformer architecture design where modifiers are learned and predicted jointly using the publicly available SemEval 2015 Task 14 corpus and a new Opioid Use Disorder (OUD) data set that contains modifiers shared with SemEval as well as novel modifiers specific for OUD. We evaluate the effectiveness of our multi-task learning approach versus previously published systems and assess the feasibility of transfer learning for clinical entity modifiers when only a portion of clinical modifiers are shared.
Results: Our approach achieved state-of-the-art results on the ShARe corpus from SemEval 2015 Task 14, showing an increase of 1.1% on weighted accuracy, 1.7% on unweighted accuracy, and 10% on micro F1 scores.
Conclusions: We show that learned weights from our shared model can be effectively transferred to a new partially matched data set, validating the use of transfer learning for clinical text modifiers - [1553] arXiv:2401.15238 (cross-list from cs.LG) [ pdf , other ]
-
Title: Deep Learning with Tabular Data: A Self-supervised ApproachAuthors: Tirth Kiranbhai VyasSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We have described a novel approach for training tabular data using the TabTransformer model with self-supervised learning. Traditional machine learning models for tabular data, such as GBDT are being widely used though our paper examines the effectiveness of the TabTransformer which is a Transformer based model optimised specifically for tabular data. The TabTransformer captures intricate relationships and dependencies among features in tabular data by leveraging the self-attention mechanism of Transformers. We have used a self-supervised learning approach in this study, where the TabTransformer learns from unlabelled data by creating surrogate supervised tasks, eliminating the need for the labelled data. The aim is to find the most effective TabTransformer model representation of categorical and numerical features. To address the challenges faced during the construction of various input settings into the Transformers. Furthermore, a comparative analysis is also been conducted to examine performance of the TabTransformer model against baseline models such as MLP and supervised TabTransformer.
The research has presented with a novel approach by creating various variants of TabTransformer model namely, Binned-TT, Vanilla-MLP-TT, MLP- based-TT which has helped to increase the effective capturing of the underlying relationship between various features of the tabular dataset by constructing optimal inputs. And further we have employed a self-supervised learning approach in the form of a masking-based unsupervised setting for tabular data. The findings shed light on the best way to represent categorical and numerical features, emphasizing the TabTransormer performance when compared to established machine learning models and other self-supervised learning methods. - [1554] arXiv:2401.15241 (cross-list from cs.CL) [ pdf , other ]
-
Title: Unlearning Reveals the Influential Training Data of Language ModelsComments: 12 pages, under reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In order to enhance the performance of language models while mitigating the risks of generating harmful content, it is crucial to identify which training dataset affects the model's outputs. Ideally, we can measure the influence of each dataset by removing it from training; however, it is prohibitively expensive to retrain a model multiple times. This paper presents UnTrac, which estimates the influence of a training dataset by unlearning it from the trained model. UnTrac is extremely simple; each training dataset is unlearned by gradient ascent, and we evaluate how much the model's predictions change after unlearning. We empirically examine if our methods can assess the influence of pretraining datasets on generating toxic, biased, and untruthful content. Experimental results demonstrate that our method estimates their influence much more accurately than existing methods while requiring neither excessive memory space nor multiple model checkpoints.
- [1555] arXiv:2401.15245 (cross-list from cs.GR) [ pdf , other ]
-
Title: GenPluSSS: A Genetic Algorithm Based Plugin for Measured Subsurface Scattering RepresentationSubjects: Graphics (cs.GR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: This paper presents a plugin that adds a representation of homogeneous and heterogeneous, optically thick, translucent materials on the Blender 3D modeling tool. The working principle of this plugin is based on a combination of Genetic Algorithm (GA) and Singular Value Decomposition (SVD)-based subsurface scattering method (GenSSS). The proposed plugin has been implemented using Mitsuba renderer, which is an open source rendering software. The proposed plugin has been validated on measured subsurface scattering data. It's shown that the proposed plugin visualizes homogeneous and heterogeneous subsurface scattering effects, accurately, compactly and computationally efficiently.
- [1556] arXiv:2401.15268 (cross-list from cs.LG) [ src ]
-
Title: Towards Stable Preferences for Stakeholder-aligned Machine LearningAuthors: Haleema Sheraz , Stefan C. Kremer , Joshua August Skorburg , Graham Taylor , Walter Sinnott-Armstrong , Kyle BoerstlerComments: Work in ProgressSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In response to the pressing challenge of kidney allocation, characterized by growing demands for organs, this research sets out to develop a data-driven solution to this problem, which also incorporates stakeholder values. The primary objective of this study is to create a method for learning both individual and group-level preferences pertaining to kidney allocations. Drawing upon data from the 'Pairwise Kidney Patient Online Survey.' Leveraging two distinct datasets and evaluating across three levels - Individual, Group and Stability - we employ machine learning classifiers assessed through several metrics. The Individual level model predicts individual participant preferences, the Group level model aggregates preferences across participants, and the Stability level model, an extension of the Group level, evaluates the stability of these preferences over time. By incorporating stakeholder preferences into the kidney allocation process, we aspire to advance the ethical dimensions of organ transplantation, contributing to more transparent and equitable practices while promoting the integration of moral values into algorithmic decision-making.
- [1557] arXiv:2401.15269 (cross-list from cs.CL) [ pdf , other ]
-
Title: Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains.
- [1558] arXiv:2401.15270 (cross-list from cs.LG) [ pdf , other ]
-
Title: SimFair: Physics-Guided Fairness-Aware Learning with Simulation ModelsComments: Accepted to AAAI 2024 (preprint)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Fairness-awareness has emerged as an essential building block for the responsible use of artificial intelligence in real applications. In many cases, inequity in performance is due to the change in distribution over different regions. While techniques have been developed to improve the transferability of fairness, a solution to the problem is not always feasible with no samples from the new regions, which is a bottleneck for pure data-driven attempts. Fortunately, physics-based mechanistic models have been studied for many problems with major social impacts. We propose SimFair, a physics-guided fairness-aware learning framework, which bridges the data limitation by integrating physical-rule-based simulation and inverse modeling into the training design. Using temperature prediction as an example, we demonstrate the effectiveness of the proposed SimFair in fairness preservation.
- [1559] arXiv:2401.15284 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Five ethical principles for generative AI in scientific researchAuthors: Zhicheng LinComments: 9 pages, 2 tablesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Generative artificial intelligence tools like large language models are rapidly transforming academic research and real world applications. However, discussions on ethical guidelines for generative AI in science remain fragmented, underscoring the urgent need for consensus based standards. This paper offers an initial framework by developing analyses and mitigation strategies across five key themes: understanding model limitations regarding truthfulness and bias; respecting privacy, confidentiality, and copyright; avoiding plagiarism and policy violations when incorporating model output; ensuring applications provide overall benefit; and using AI transparently and reproducibly. Common scenarios are outlined to demonstrate potential ethical violations. We argue that global consensus coupled with professional training and reasonable enforcement are critical to promoting the benefits of AI while safeguarding research integrity.
- [1560] arXiv:2401.15293 (cross-list from cs.CV) [ pdf , other ]
-
Title: SkipViT: Speeding Up Vision Transformers with a Token-Level Skip ConnectionAuthors: Foozhan Ataiefard , Walid Ahmed , Habib Hajimolahoseini , Saina Asani , Farnoosh Javadi , Mohammad Hassanpour , Omar Mohamed Awad , Austin Wen , Kangling Liu , Yang LiuSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Vision transformers are known to be more computationally and data-intensive than CNN models. These transformer models such as ViT, require all the input image tokens to learn the relationship among them. However, many of these tokens are not informative and may contain irrelevant information such as unrelated background or unimportant scenery. These tokens are overlooked by the multi-head self-attention (MHSA), resulting in many redundant and unnecessary computations in MHSA and the feed-forward network (FFN). In this work, we propose a method to optimize the amount of unnecessary interactions between unimportant tokens by separating and sending them through a different low-cost computational path. Our method does not add any parameters to the ViT model and aims to find the best trade-off between training throughput and achieving a 0% loss in the Top-1 accuracy of the final model. Our experimental results on training ViT-small from scratch show that SkipViT is capable of effectively dropping 55% of the tokens while gaining more than 13% training throughput and maintaining classification accuracy at the level of the baseline model on Huawei Ascend910A.
- [1561] arXiv:2401.15296 (cross-list from cs.CV) [ pdf , other ]
-
Title: A Survey on 3D Skeleton Based Person Re-Identification: Approaches, Designs, Challenges, and Future DirectionsComments: A up-to-date resource (papers, codes, data, etc.) of this survey is provided at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Person re-identification via 3D skeletons is an important emerging research area that triggers great interest in the pattern recognition community. With distinctive advantages for many application scenarios, a great diversity of 3D skeleton based person re-identification (SRID) methods have been proposed in recent years, effectively addressing prominent problems in skeleton modeling and feature learning. Despite recent advances, to the best of our knowledge, little effort has been made to comprehensively summarize these studies and their challenges. In this paper, we attempt to fill this gap by providing a systematic survey on current SRID approaches, model designs, challenges, and future directions. Specifically, we first formulate the SRID problem, and propose a taxonomy of SRID research with a summary of benchmark datasets, commonly-used model architectures, and an analytical review of different methods' characteristics. Then, we elaborate on the design principles of SRID models from multiple aspects to offer key insights for model improvement. Finally, we identify critical challenges confronting current studies and discuss several promising directions for future research of SRID.
- [1562] arXiv:2401.15299 (cross-list from cs.LG) [ pdf , other ]
-
Title: SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural NetworksComments: 9 pages, 8 figures; Accepted to 4th workshop on Graphs and more Complex structures for Learning and Reasoning, colocated with AAAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Systems and Control (eess.SY); Applications (stat.AP)
Abstract: Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graph-like in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problems using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of factory issues. By utilizing this dataset, researchers can employ GNNs to address numerous supply chain problems, thereby advancing the field of supply chain analytics and planning. Source: this https URL
- [1563] arXiv:2401.15318 (cross-list from cs.GR) [ pdf , other ]
-
Title: Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian SplattingAuthors: Yutao Feng , Xiang Feng , Yintong Shang , Ying Jiang , Chang Yu , Zeshun Zong , Tianjia Shao , Hongzhi Wu , Kun Zhou , Chenfanfu Jiang , Yin YangSubjects: Graphics (cs.GR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. For more information, please visit our project page at \url{ this https URL }.
- [1564] arXiv:2401.15323 (cross-list from cs.SD) [ pdf , other ]
-
Title: Music Auto-Tagging with Robust Music Representation Learned via Domain Adversarial TrainingComments: 5 pages, 3 figures, accepted to ICASSP 2024Subjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
Abstract: Music auto-tagging is crucial for enhancing music discovery and recommendation. Existing models in Music Information Retrieval (MIR) struggle with real-world noise such as environmental and speech sounds in multimedia content. This study proposes a method inspired by speech-related tasks to enhance music auto-tagging performance in noisy settings. The approach integrates Domain Adversarial Training (DAT) into the music domain, enabling robust music representations that withstand noise. Unlike previous research, this approach involves an additional pretraining phase for the domain classifier, to avoid performance degradation in the subsequent phase. Adding various synthesized noisy music data improves the model's generalization across different noise levels. The proposed architecture demonstrates enhanced performance in music auto-tagging by effectively utilizing unlabeled noisy music data. Additional experiments with supplementary unlabeled data further improves the model's performance, underscoring its robust generalization capabilities and broad applicability.
- [1565] arXiv:2401.15335 (cross-list from cs.CR) [ pdf , other ]
-
Title: L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial AttacksComments: Under Review of IJCNN 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: In the rapidly evolving field of machine learning, adversarial attacks present a significant challenge to model robustness and security. Decision-based attacks, which only require feedback on the decision of a model rather than detailed probabilities or scores, are particularly insidious and difficult to defend against. This work introduces L-AutoDA (Large Language Model-based Automated Decision-based Adversarial Attacks), a novel approach leveraging the generative capabilities of Large Language Models (LLMs) to automate the design of these attacks. By iteratively interacting with LLMs in an evolutionary framework, L-AutoDA automatically designs competitive attack algorithms efficiently without much human effort. We demonstrate the efficacy of L-AutoDA on CIFAR-10 dataset, showing significant improvements over baseline methods in both success rate and computational efficiency. Our findings underscore the potential of language models as tools for adversarial attack generation and highlight new avenues for the development of robust AI systems.
- [1566] arXiv:2401.15337 (cross-list from cs.LG) [ pdf , other ]
-
Title: Deep Learning with Information Fusion and Model Interpretation for Health Monitoring of Fetus based on Long-term Prenatal Electronic Fetal Heart Rate Monitoring DataAuthors: Zenghui Lin , Xintong Liu , Nan Wang , Ruichen Li , Qingao Liu , Jingying Ma , Liwei Wang , Yan Wang , Shenda HongSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Long-term fetal heart rate (FHR) monitoring during the antepartum period, increasingly popularized by electronic FHR monitoring, represents a growing approach in FHR monitoring. This kind of continuous monitoring, in contrast to the short-term one, collects an extended period of fetal heart data. This offers a more comprehensive understanding of fetus's conditions. However, the interpretation of long-term antenatal fetal heart monitoring is still in its early stages, lacking corresponding clinical standards. Furthermore, the substantial amount of data generated by continuous monitoring imposes a significant burden on clinical work when analyzed manually. To address above challenges, this study develops an automatic analysis system named LARA (Long-term Antepartum Risk Analysis system) for continuous FHR monitoring, combining deep learning and information fusion methods. LARA's core is a well-established convolutional neural network (CNN) model. It processes long-term FHR data as input and generates a Risk Distribution Map (RDM) and Risk Index (RI) as the analysis results. We evaluate LARA on inner test dataset, the performance metrics are as follows: AUC 0.872, accuracy 0.816, specificity 0.811, sensitivity 0.806, precision 0.271, and F1 score 0.415. In our study, we observe that long-term FHR monitoring data with higher RI is more likely to result in adverse outcomes (p=0.0021). In conclusion, this study introduces LARA, the first automated analysis system for long-term FHR monitoring, initiating the further explorations into its clinical value in the future.
- [1567] arXiv:2401.15347 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Comprehensive Survey of Compression Algorithms for Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: How can we compress language models without sacrificing accuracy? The number of compression algorithms for language models is rapidly growing to benefit from remarkable advances of recent language models without side effects due to the gigantic size of language models, such as increased carbon emissions and expensive maintenance fees. While numerous compression algorithms have shown remarkable progress in compressing language models, it ironically becomes challenging to capture emerging trends and identify the fundamental concepts underlying them due to the excessive number of algorithms. In this paper, we survey and summarize diverse compression algorithms including pruning, quantization, knowledge distillation, low-rank approximation, parameter sharing, and efficient architecture design. We not only summarize the overall trend of diverse compression algorithms but also select representative algorithms and provide in-depth analyses of them. We discuss the value of each category of compression algorithms, and the desired properties of low-cost compression algorithms which have a significant impact due to the emergence of large language models. Finally, we introduce promising future research topics based on our survey results.
- [1568] arXiv:2401.15351 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Survey on Neural Topic Models: Methods, Applications, and ChallengesComments: Accepted to Artifcial Intelligence Review. See this https URL and a paper list at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Topic models have been prevalent for decades to discover latent topics and infer topic proportions of documents in an unsupervised fashion. They have been widely used in various applications like text analysis and context recommendation. Recently, the rise of neural networks has facilitated the emergence of a new research field -- Neural Topic Models (NTMs). Different from conventional topic models, NTMs directly optimize parameters without requiring model-specific derivations. This endows NTMs with better scalability and flexibility, resulting in significant research attention and plentiful new methods and applications. In this paper, we present a comprehensive survey on neural topic models concerning methods, applications, and challenges. Specifically, we systematically organize current NTM methods according to their network structures and introduce the NTMs for various scenarios like short texts and cross-lingual documents. We also discuss a wide range of popular applications built on NTMs. Finally, we highlight the challenges confronted by NTMs to inspire future research.
- [1569] arXiv:2401.15378 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: A RAG-based Question Answering System Proposal for Understanding Islam: MufassirQAS LLMSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Challenges exist in learning and understanding religions, such as the complexity and depth of religious doctrines and teachings. Chatbots as question-answering systems can help in solving these challenges. LLM chatbots use NLP techniques to establish connections between topics and accurately respond to complex questions. These capabilities make it perfect for enlightenment on religion as a question-answering chatbot. However, LLMs also tend to generate false information, known as hallucination. Also, the chatbots' responses can include content that insults personal religious beliefs, interfaith conflicts, and controversial or sensitive topics. It must avoid such cases without promoting hate speech or offending certain groups of people or their beliefs. This study uses a vector database-based Retrieval Augmented Generation (RAG) approach to enhance the accuracy and transparency of LLMs. Our question-answering system is called "MufassirQAS". We created a database consisting of several open-access books that include Turkish context. These books contain Turkish translations and interpretations of Islam. This database is utilized to answer religion-related questions and ensure our answers are trustworthy. The relevant part of the dataset, which LLM also uses, is presented along with the answer. We have put careful effort into creating system prompts that give instructions to prevent harmful, offensive, or disrespectful responses to respect people's values and provide reliable results. The system answers and shares additional information, such as the page number from the respective book and the articles referenced for obtaining the information. MufassirQAS and ChatGPT are also tested with sensitive questions. We got better performance with our system. Study and enhancements are still in progress. Results and future works are given.
- [1570] arXiv:2401.15390 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: A microservice architecture for real-time IoT data processing: A reusable Web of things approach for smart portsAuthors: Guadalupe Ortiz , Juan Boubeta-Puig , Javier Criado , David Corral-Plaza , Alfonso Garcia-de-Prado , Inmaculada Medina-Bulo , Luis IribarneJournal-ref: Comput.Stand.Interfaces 81:103604(2022)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Major advances in telecommunications and the Internet of Things have given rise to numerous smart city scenarios in which smart services are provided. What was once a dream for the future has now become reality. However, the need to provide these smart services quickly, efficiently, in an interoperable manner and in real time is a cutting-edge technological challenge. Although some software architectures offer solutions in this area, these are often limited in terms of reusability and maintenance by independent modules, involving the need for system downtime when maintaining or evolving, as well as by a lack of standards in terms of the interoperability of their interface. In this paper, we propose a fully reusable microservice architecture, standardized through the use of the Web of things paradigm, and with high efficiency in real-time data processing, supported by complex event processing techniques. To illustrate the proposal, we present a fully reusable implementation of the microservices necessary for the deployment of the architecture in the field of air quality monitoring and alerting in smart ports. The performance evaluation of this architecture shows excellent results.
- [1571] arXiv:2401.15463 (cross-list from cs.CL) [ pdf , other ]
-
Title: DataFrame QA: A Universal LLM Framework on DataFrame Question Answering Without Data ExposureSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces DataFrame question answering (QA), a novel task that utilizes large language models (LLMs) to generate Pandas queries for information retrieval and data analysis on dataframes, emphasizing safe and non-revealing data handling. Our method, which solely relies on dataframe column names, not only ensures data privacy but also significantly reduces the context window in the prompt, streamlining information processing and addressing major challenges in LLM-based data analysis. We propose DataFrame QA as a comprehensive framework that includes safe Pandas query generation and code execution. Various LLMs, notably GPT-4, are evaluated using the pass@1 metric on the renowned WikiSQL and our newly developed 'UCI-DataFrameQA', tailored for complex data analysis queries. Our findings indicate that GPT-4 achieves pass@1 rates of 86% on WikiSQL and 97% on UCI-DataFrameQA, underscoring its capability in securely retrieving and aggregating dataframe values and conducting sophisticated data analyses. This approach, deployable in a zero-shot manner without prior training or adjustments, proves to be highly adaptable and secure for diverse applications.
- [1572] arXiv:2401.15469 (cross-list from cs.LG) [ pdf , other ]
-
Title: Wind speed super-resolution and validation: from ERA5 to CERRA via diffusion modelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract: The Copernicus Regional Reanalysis for Europe, CERRA, is a high-resolution regional reanalysis dataset for the European domain. In recent years it has shown significant utility across various climate-related tasks, ranging from forecasting and climate change research to renewable energy prediction, resource management, air quality risk assessment, and the forecasting of rare events, among others. Unfortunately, the availability of CERRA is lagging two years behind the current date, due to constraints in acquiring the requisite external data and the intensive computational demands inherent in its generation. As a solution, this paper introduces a novel method using diffusion models to approximate CERRA downscaling in a data-driven manner, without additional informations. By leveraging the lower resolution ERA5 dataset, which provides boundary conditions for CERRA, we approach this as a super-resolution task. Focusing on wind speed around Italy, our model, trained on existing CERRA data, shows promising results, closely mirroring original CERRA data. Validation with in-situ observations further confirms the model's accuracy in approximating ground measurements.
- [1573] arXiv:2401.15480 (cross-list from cs.LG) [ pdf , other ]
-
Title: Social Interpretable Reinforcement LearningComments: 9 pages, 6 figures, submitted to IJCAI 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: Reinforcement Learning (RL) bears the promise of being an enabling technology for many applications. However, since most of the literature in the field is currently focused on opaque models, the use of RL in high-stakes scenarios, where interpretability is crucial, is still limited. Recently, some approaches to interpretable RL, e.g., based on Decision Trees, have been proposed, but one of the main limitations of these techniques is their training cost. To overcome this limitation, we propose a new population-based method, called Social Interpretable RL (SIRL), inspired by social learning principles, to improve learning efficiency. Our method mimics a social learning process, where each agent in a group learns to solve a given task based both on its own individual experience as well as the experience acquired together with its peers. Our approach is divided into two phases. In the \emph{collaborative phase}, all the agents in the population interact with a shared instance of the environment, where each agent observes the state and independently proposes an action. Then, voting is performed to choose the action that will actually be performed in the environment. In the \emph{individual phase}, each agent refines its individual performance by interacting with its own instance of the environment. This mechanism makes the agents experience a larger number of episodes while simultaneously reducing the computational cost of the process. Our results on six well-known benchmarks show that SIRL reaches state-of-the-art performance w.r.t. the alternative interpretable methods from the literature.
- [1574] arXiv:2401.15487 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Artificial Intelligence: Arguments for Catastrophic RiskComments: 12 pagesSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: Recent progress in artificial intelligence (AI) has drawn attention to the technology's transformative potential, including what some see as its prospects for causing large-scale harm. We review two influential arguments purporting to show how AI could pose catastrophic risks. The first argument -- the Problem of Power-Seeking -- claims that, under certain assumptions, advanced AI systems are likely to engage in dangerous power-seeking behavior in pursuit of their goals. We review reasons for thinking that AI systems might seek power, that they might obtain it, that this could lead to catastrophe, and that we might build and deploy such systems anyway. The second argument claims that the development of human-level AI will unlock rapid further progress, culminating in AI systems far more capable than any human -- this is the Singularity Hypothesis. Power-seeking behavior on the part of such systems might be particularly dangerous. We discuss a variety of objections to both arguments and conclude by assessing the state of the debate.
- [1575] arXiv:2401.15489 (cross-list from cs.CV) [ pdf , other ]
-
Title: Distilling Privileged Multimodal Information for Expression Recognition using Optimal TransportAuthors: Muhammad Haseeb Aslam , Muhammad Osama Zeeshan , Soufiane Belharbi , Marco Pedersoli , Alessandro Koerich , Simon Bacon , Eric GrangerSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Multimodal affect recognition models have reached remarkable performance in the lab environment due to their ability to model complementary and redundant semantic information. However, these models struggle in the wild, mainly because of the unavailability or quality of modalities used for training. In practice, only a subset of the training-time modalities may be available at test time. Learning with privileged information (PI) enables deep learning models (DL) to exploit data from additional modalities only available during training. State-of-the-art knowledge distillation (KD) methods have been proposed to distill multiple teacher models (each trained on a modality) to a common student model. These privileged KD methods typically utilize point-to-point matching and have no explicit mechanism to capture the structural information in the teacher representation space formed by introducing the privileged modality. We argue that encoding this same structure in the student space may lead to enhanced student performance. This paper introduces a new structural KD mechanism based on optimal transport (OT), where entropy-regularized OT distills the structural dark knowledge. Privileged KD with OT (PKDOT) method captures the local structures in the multimodal teacher representation by calculating a cosine similarity matrix and selects the top-k anchors to allow for sparse OT solutions, resulting in a more stable distillation process. Experiments were performed on two different problems: pain estimation on the Biovid dataset (ordinal classification) and arousal-valance prediction on the Affwild2 dataset (regression). Results show that the proposed method can outperform state-of-the-art privileged KD methods on these problems. The diversity of different modalities and fusion architectures indicates that the proposed PKDOT method is modality and model-agnostic.
- [1576] arXiv:2401.15496 (cross-list from cs.CL) [ pdf , other ]
-
Title: Baichuan2-Sum: Instruction Finetune Baichuan2-7B Model for Dialogue SummarizationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large language models (LLMs) like Llama, Baichuan and Bloom models show remarkable ability with instruction fine-tuning in many natural language tasks. Nevertheless, for the dialogue summarization task, which aims to generate summaries for different roles in dialogue, most of the state-of-the-art methods conduct on small models (e.g Bart and Bert). Existing methods try to add task specified optimization on small models like adding global-local centrality score to models. In this paper, we propose an instruction fine-tuning model: Baichuan2-Sum, for role-oriented diaglouge summarization. By setting different instructions for different roles, the model can learn from the dialogue interactions and output the expected summaries. Furthermore, we applied NEFTune technique to add suitable noise during training to improve the results. The experiments demonstrate that the proposed model achieves the new state-of-the-art results on two public dialogue summarization datasets: CSDS and SAMSUM. We release our model and related codes to facilitate future studies on dialogue summarization task.
- [1577] arXiv:2401.15509 (cross-list from cs.CL) [ pdf , other ]
-
Title: Style-News: Incorporating Stylized News Generation and Adversarial Verification for Neural Fake News DetectionComments: EACL 2024 Main TrackSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: With the improvements in generative models, the issues of producing hallucinations in various domains (e.g., law, writing) have been brought to people's attention due to concerns about misinformation. In this paper, we focus on neural fake news, which refers to content generated by neural networks aiming to mimic the style of real news to deceive people. To prevent harmful disinformation spreading fallaciously from malicious social media (e.g., content farms), we propose a novel verification framework, Style-News, using publisher metadata to imply a publisher's template with the corresponding text types, political stance, and credibility. Based on threat modeling aspects, a style-aware neural news generator is introduced as an adversary for generating news content conditioning for a specific publisher, and style and source discriminators are trained to defend against this attack by identifying which publisher the style corresponds with, and discriminating whether the source of the given news is human-written or machine-generated. To evaluate the quality of the generated content, we integrate various dimensional metrics (language fluency, content preservation, and style adherence) and demonstrate that Style-News significantly outperforms the previous approaches by a margin of 0.35 for fluency, 15.24 for content, and 0.38 for style at most. Moreover, our discriminative model outperforms state-of-the-art baselines in terms of publisher prediction (up to 4.64%) and neural fake news detection (+6.94% $\sim$ 31.72%).
- [1578] arXiv:2401.15545 (cross-list from cs.SE) [ pdf , other ]
-
Title: PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation ModelsComments: This paper has been accepted to The ACM International Conference on the Foundations of Software Engineering FSE 2024Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Programming Languages (cs.PL)
Abstract: In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Benchmarking LCGMs necessitates the creation of a set of diverse programming problems, and each problem comprises the prompt (including the task description), canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manual methods and perturbation-based methods. However, manual methods demand high effort and lack scalability, while also risking data integrity due to LCGMs' potentially contaminated data collection, and perturbation-based approaches mainly generate semantically homogeneous problems with the same canonical solutions and introduce typos that can be easily auto-corrected by IDE, making them ineffective and unrealistic. In this work, we propose the idea of programming problem merging (PPM) and provide two implementation of this idea, we utilize our tool on two widely-used datasets and compare it against nine baseline methods using eight code generation models. The results demonstrate the effectiveness of our tool in generating more challenging, diverse, and natural programming problems, comparing to the baselines.
- [1579] arXiv:2401.15564 (cross-list from eess.SY) [ pdf , ps , other ]
-
Title: Design of UAV flight state recognition and trajectory prediction system based on trajectory feature constructionSubjects: Systems and Control (eess.SY) ; Artificial Intelligence (cs.AI)
Abstract: With the impact of artificial intelligence on the traditional UAV industry, autonomous UAV flight has become a current hot research field. Based on the demand for research on critical technologies for autonomous flying UAVs, this paper addresses the field of flight state recognition and trajectory prediction of UAVs. This paper proposes a method to improve the accuracy of UAV trajectory prediction based on UAV flight state recognition and verifies it using two prediction models. Firstly, UAV flight data acquisition and data preprocessing are carried out; secondly, UAV flight trajectory features are extracted based on data fusion and a UAV flight state recognition model based on PCA-DAGSVM model is established; finally, two UAV flight trajectory prediction models are established and the trajectory prediction errors of the two prediction models are compared and analyzed after flight state recognition. The results show that: 1) the UAV flight state recognition model based on PCA-DAGSVM has good recognition effect. 2) compared with the traditional UAV trajectory prediction model, the prediction model based on flight state recognition can effectively reduce the prediction error.
- [1580] arXiv:2401.15568 (cross-list from cs.CV) [ pdf , other ]
-
Title: Intriguing Equivalence Structures of the Embedding Space of Vision TransformersComments: 8 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Pre-trained large foundation models play a central role in the recent surge of artificial intelligence, resulting in fine-tuned models with remarkable abilities when measured on benchmark datasets, standard exams, and applications. Due to their inherent complexity, these models are not well understood. While small adversarial inputs to such models are well known, the structures of the representation space are not well characterized despite their fundamental importance. In this paper, using the vision transformers as an example due to the continuous nature of their input space, we show via analyses and systematic experiments that the representation space consists of large piecewise linear subspaces where there exist very different inputs sharing the same representations, and at the same time, local normal spaces where there are visually indistinguishable inputs having very different representations. The empirical results are further verified using the local directional estimations of the Lipschitz constants of the underlying models. Consequently, the resulting representations change the results of downstream models, and such models are subject to overgeneralization and with limited semantically meaningful generalization capability.
- [1581] arXiv:2401.15617 (cross-list from cs.LG) [ pdf , other ]
-
Title: Diffusion-based graph generative methodsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Being the most cutting-edge generative methods, diffusion methods have shown great advances in wide generation tasks. Among them, graph generation attracts significant research attention for its broad application in real life. In our survey, we systematically and comprehensively review on diffusion-based graph generative methods. We first make a review on three mainstream paradigms of diffusion methods, which are denoising diffusion probabilistic models, score-based genrative models, and stochastic differential equations. Then we further categorize and introduce the latest applications of diffusion models on graphs. In the end, we point out some limitations of current studies and future directions of future explorations. The summary of existing methods metioned in this survey is in this https URL .
- [1582] arXiv:2401.15620 (cross-list from cs.RO) [ pdf , other ]
-
Title: Data-Driven Strategies for Coping with Incomplete DVL MeasurementsSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP); Systems and Control (eess.SY)
Abstract: Autonomous underwater vehicles are specialized platforms engineered for deep underwater operations. Critical to their functionality is autonomous navigation, typically relying on an inertial navigation system and a Doppler velocity log. In real-world scenarios, incomplete Doppler velocity log measurements occur, resulting in positioning errors and mission aborts. To cope with such situations, a model and learning approaches were derived. This paper presents a comparative analysis of two cutting-edge deep learning methodologies, namely LiBeamsNet and MissBeamNet, alongside a model-based average estimator. These approaches are evaluated for their efficacy in regressing missing Doppler velocity log beams when two beams are unavailable. In our study, we used data recorded by a DVL mounted on an autonomous underwater vehicle operated in the Mediterranean Sea. We found that both deep learning architectures outperformed model-based approaches by over 16% in velocity prediction accuracy.
- [1583] arXiv:2401.15625 (cross-list from cs.CR) [ pdf , other ]
-
Title: Generative AI-enabled Blockchain Networks: Fundamentals, Applications, and Case StudyAuthors: Cong T. Nguyen , Yinqiu Liu , Hongyang Du , Dinh Thai Hoang , Dusit Niyato , Diep N. Nguyen , Shiwen MaoSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Generative Artificial Intelligence (GAI) has recently emerged as a promising solution to address critical challenges of blockchain technology, including scalability, security, privacy, and interoperability. In this paper, we first introduce GAI techniques, outline their applications, and discuss existing solutions for integrating GAI into blockchains. Then, we discuss emerging solutions that demonstrate the effectiveness of GAI in addressing various challenges of blockchain, such as detecting unknown blockchain attacks and smart contract vulnerabilities, designing key secret sharing schemes, and enhancing privacy. Moreover, we present a case study to demonstrate that GAI, specifically the generative diffusion model, can be employed to optimize blockchain network performance metrics. Experimental results clearly show that, compared to a baseline traditional AI approach, the proposed generative diffusion model approach can converge faster, achieve higher rewards, and significantly improve the throughput and latency of the blockchain network. Additionally, we highlight future research directions for GAI in blockchain applications, including personalized GAI-enabled blockchains, GAI-blockchain synergy, and privacy and security considerations within blockchain ecosystems.
- [1584] arXiv:2401.15626 (cross-list from cs.CL) [ pdf , other ]
-
Title: TA&AT: Enhancing Task-Oriented Dialog with Turn-Level Auxiliary Tasks and Action-Tree Based Scheduled SamplingComments: Accepted by AAAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Task-oriented dialog systems have witnessed substantial progress due to conversational pre-training techniques. Yet, two significant challenges persist. First, most systems primarily utilize the latest turn's state label for the generator. This practice overlooks the comprehensive value of state labels in boosting the model's understanding for future generations. Second, an overreliance on generated policy often leads to error accumulation, resulting in suboptimal responses when adhering to incorrect actions. To combat these challenges, we propose turn-level multi-task objectives for the encoder. With the guidance of essential information from labeled intermediate states, we establish a more robust representation for both understanding and generation. For the decoder, we introduce an action tree-based scheduled sampling technique. Specifically, we model the hierarchical policy as trees and utilize the similarity between trees to sample negative policy based on scheduled sampling, hoping the model to generate invariant responses under perturbations. This method simulates potential pitfalls by sampling similar negative policy, bridging the gap between task-oriented dialog training and inference. Among methods without continual pre-training, our approach achieved state-of-the-art (SOTA) performance on the MultiWOZ dataset series and was also competitive with pre-trained SOTA methods.
- [1585] arXiv:2401.15647 (cross-list from cs.CV) [ pdf , other ]
-
Title: UP-CrackNet: Unsupervised Pixel-Wise Road Crack Detection via Adversarial Image RestorationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Over the past decade, automated methods have been developed to detect cracks more efficiently, accurately, and objectively, with the ultimate goal of replacing conventional manual visual inspection techniques. Among these methods, semantic segmentation algorithms have demonstrated promising results in pixel-wise crack detection tasks. However, training such data-driven algorithms requires a large amount of human-annotated datasets with pixel-level annotations, which is a highly labor-intensive and time-consuming process. Moreover, supervised learning-based methods often struggle with poor generalization ability in unseen datasets. Therefore, we propose an unsupervised pixel-wise road crack detection network, known as UP-CrackNet. Our approach first generates multi-scale square masks and randomly selects them to corrupt undamaged road images by removing certain regions. Subsequently, a generative adversarial network is trained to restore the corrupted regions by leveraging the semantic context learned from surrounding uncorrupted regions. During the testing phase, an error map is generated by calculating the difference between the input and restored images, which allows for pixel-wise crack detection. Our comprehensive experimental results demonstrate that UP-CrackNet outperforms other general-purpose unsupervised anomaly detection algorithms, and exhibits comparable performance and superior generalizability when compared with state-of-the-art supervised crack segmentation algorithms. Our source code is publicly available at mias.group/UP-CrackNet.
- [1586] arXiv:2401.15670 (cross-list from cs.CL) [ pdf , other ]
-
Title: YODA: Teacher-Student Progressive Learning for Language ModelsAuthors: Jianqiao Lu , Wanjun Zhong , Yufei Wang , Zhijiang Guo , Qi Zhu , Wenyong Huang , Yanlin Wang , Fei Mi , Baojun Wang , Yasheng Wang , Lifeng Shang , Xin Jiang , Qun LiuComments: 14 pages, 4 figures, 3 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-student progressive learning framework that emulates the teacher-student education process to improve the efficacy of model fine-tuning. The framework operates on an interactive \textit{basic-generalized-harder} loop. The teacher agent provides tailored feedback on the student's answers, and systematically organizes the education process. This process unfolds by teaching the student basic examples, reinforcing understanding through generalized questions, and then enhancing learning by posing questions with progressively enhanced complexity. With the teacher's guidance, the student learns to iteratively refine its answer with feedback, and forms a robust and comprehensive understanding of the posed questions. The systematic procedural data, which reflects the progressive learning process of humans, is then utilized for model training. Taking math reasoning as a testbed, experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain (+17.01\% on GSM8K and +9.98\% on MATH). In addition, we find that training with curriculum learning further improves learning robustness.
- [1587] arXiv:2401.15675 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: Detection of a facemask in real-time using deep learning methods: Prevention of Covid 19Authors: Gautam Siddharth Kashyap , Jatin Sohlot , Ayesha Siddiqui , Ramsha Siddiqui , Karan Malik , Samar Wazir , Alexander E. I. BrownleeComments: Research Advances in Network Technologies (Volume 2) (CRC Press Taylor and Francis), 2023 (Accepted)Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: A health crisis is raging all over the world with the rapid transmission of the novel-coronavirus disease (Covid-19). Out of the guidelines issued by the World Health Organisation (WHO) to protect us against Covid-19, wearing a facemask is the most effective. Many countries have necessitated the wearing of face masks, but monitoring a large number of people to ensure that they are wearing masks in a crowded place is a challenging task in itself. The novel-coronavirus disease (Covid-19) has already affected our day-to-day life as well as world trade movements. By the end of April 2021, the world has recorded 144,358,956 confirmed cases of novel-coronavirus disease (Covid-19) including 3,066,113 deaths according to the world health organization (WHO). These increasing numbers motivate automated techniques for the detection of a facemask in real-time scenarios for the prevention of Covid-19. We propose a technique using deep learning that works for single and multiple people in a frame recorded via webcam in still or in motion. We have also experimented with our approach in night light. The accuracy of our model is good compared to the other approaches in the literature; ranging from 74% for multiple people in a nightlight to 99% for a single person in daylight.
- [1588] arXiv:2401.15713 (cross-list from cs.LG) [ pdf , other ]
-
Title: Contrastive Learning and Mixture of Experts Enables Precise Vector EmbeddingsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: The advancement of transformer neural networks has significantly elevated the capabilities of sentence similarity models, particularly in creating effective vector representations of natural language inputs. However, these models face notable challenges in domain-specific contexts, especially in highly specialized scientific sub-fields. Traditional methods often struggle in this regime, either overgeneralizing similarities within a niche or being overly sensitive to minor differences, resulting in inaccurate text classification and subpar vector representation. In an era where retrieval augmentation and search are increasingly crucial, precise and concise numerical representations are essential. In this paper, we target this issue by assembling niche datasets using co-citations as a similarity metric, focusing on biomedical domains. We employ two key strategies for fine-tuning state-of-the-art models: 1. Domain-specific Fine-Tuning, which tailors pretrained models to a single domain, and 2. Universal Applicability with Mixture of Experts (MoE), adapting pretrained models with enforced routing for multiple domains simultaneously. Our training approach emphasizes the use of abstracts for faster training, incorporating Multiple Negative Rankings loss for efficient contrastive learning. Notably, our MoE variants, equipped with $N$ experts, achieve the efficacy of $N$ individual models, heralding a new era of versatile, One-Size-Fits-All transformer networks for various tasks. This methodology marks significant advancements in scientific text classification metrics and holds promise for enhancing vector database search and compilation.
- [1589] arXiv:2401.15721 (cross-list from cs.CV) [ pdf , other ]
-
Title: A Study of Acquisition Functions for Medical Imaging Deep Active LearningAuthors: Bonaventure F. P. DossouComments: Best Poster Award at Deep Learning Indaba 2023 ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Abstract: The Deep Learning revolution has enabled groundbreaking achievements in recent years. From breast cancer detection to protein folding, deep learning algorithms have been at the core of very important advancements. However, these modern advancements are becoming more and more data-hungry, especially on labeled data whose availability is scarce: this is even more prevalent in the medical context. In this work, we show how active learning could be very effective in data scarcity situations, where obtaining labeled data (or annotation budget is very limited). We compare several selection criteria (BALD, MeanSTD, and MaxEntropy) on the ISIC 2016 dataset. We also explored the effect of acquired pool size on the model's performance. Our results suggest that uncertainty is useful to the Melanoma detection task, and confirms the hypotheses of the author of the paper of interest, that \textit{bald} performs on average better than other acquisition functions. Our extended analyses however revealed that all acquisition functions perform badly on the positive (cancerous) samples, suggesting exploitation of class unbalance, which could be crucial in real-world settings. We finish by suggesting future work directions that would be useful to improve this current work. The code of our implementation is open-sourced at \url{ this https URL }
- [1590] arXiv:2401.15741 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: SERNet-Former: Semantic Segmentation by Efficient Residual Network with Attention-Boosting Gates and Attention-Fusion NetworksAuthors: Serdar ErisenSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Improving the efficiency of state-of-the-art methods in semantic segmentation requires overcoming the increasing computational cost as well as issues such as fusing semantic information from global and local contexts. Based on the recent success and problems that convolutional neural networks (CNNs) encounter in semantic segmentation, this research proposes an encoder-decoder architecture with a unique efficient residual network, Efficient-ResNet. Attention-boosting gates (AbGs) and attention-boosting modules (AbMs) are deployed by aiming to fuse the equivariant and feature-based semantic information with the equivalent sizes of the output of global context of the efficient residual network in the encoder. Respectively, the decoder network is developed with the additional attention-fusion networks (AfNs) inspired by AbM. AfNs are designed to improve the efficiency in the one-to-one conversion of the semantic information by deploying additional convolution layers in the decoder part. Our network is tested on the challenging CamVid and Cityscapes datasets, and the proposed methods reveal significant improvements on the residual networks. To the best of our knowledge, the developed network, SERNet-Former, achieves state-of-the-art results (84.62 % mean IoU) on CamVid dataset and challenging results (87.35 % mean IoU) on Cityscapes validation dataset.
- [1591] arXiv:2401.15753 (cross-list from cs.CV) [ pdf , other ]
-
Title: An objective comparison of methods for augmented reality in laparoscopic liver resection by preoperative-to-intraoperative image fusionAuthors: Sharib Ali , Yamid Espinel , Yueming Jin , Peng Liu , Bianca Güttner , Xukun Zhang , Lihua Zhang , Tom Dowrick , Matthew J. Clarkson , Shiting Xiao , Yifan Wu , Yijun Yang , Lei Zhu , Dai Sun , Lan Li , Micha Pfeiffer , Shahid Farid , Lena Maier-Hein , Emmanuel Buc , Adrien BartoliComments: 24 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
Abstract: Augmented reality for laparoscopic liver resection is a visualisation mode that allows a surgeon to localise tumours and vessels embedded within the liver by projecting them on top of a laparoscopic image. Preoperative 3D models extracted from CT or MRI data are registered to the intraoperative laparoscopic images during this process. In terms of 3D-2D fusion, most of the algorithms make use of anatomical landmarks to guide registration. These landmarks include the liver's inferior ridge, the falciform ligament, and the occluding contours. They are usually marked by hand in both the laparoscopic image and the 3D model, which is time-consuming and may contain errors if done by a non-experienced user. Therefore, there is a need to automate this process so that augmented reality can be used effectively in the operating room. We present the Preoperative-to-Intraoperative Laparoscopic Fusion Challenge (P2ILF), held during the Medical Imaging and Computer Assisted Interventions (MICCAI 2022) conference, which investigates the possibilities of detecting these landmarks automatically and using them in registration. The challenge was divided into two tasks: 1) A 2D and 3D landmark detection task and 2) a 3D-2D registration task. The teams were provided with training data consisting of 167 laparoscopic images and 9 preoperative 3D models from 9 patients, with the corresponding 2D and 3D landmark annotations. A total of 6 teams from 4 countries participated, whose proposed methods were evaluated on 16 images and two preoperative 3D models from two patients. All the teams proposed deep learning-based methods for the 2D and 3D landmark segmentation tasks and differentiable rendering-based methods for the registration task. Based on the experimental outcomes, we propose three key hypotheses that determine current limitations and future directions for research in this domain.
- [1592] arXiv:2401.15766 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: EEG for fatigue monitoringAuthors: Ildar RakhmatulinSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Physiological fatigue, a state of reduced cognitive and physical performance resulting from prolonged mental or physical exertion, poses significant challenges in various domains, including healthcare, aviation, transportation, and industrial sectors. As the understanding of fatigue's impact on human performance grows, there is a growing interest in developing effective fatigue monitoring techniques. Among these techniques, electroencephalography (EEG) has emerged as a promising tool for objectively assessing physiological fatigue due to its non-invasiveness, high temporal resolution, and sensitivity to neural activity. This paper aims to provide a comprehensive analysis of the current state of the use of EEG for monitoring physiological fatigue.
- [1593] arXiv:2401.15773 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Evaluation of k-means time series clustering based on z-normalization and NP-FreeComments: 12 pages, 6 figures, 8 tables, 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Despite the widespread use of k-means time series clustering in various domains, there exists a gap in the literature regarding its comprehensive evaluation with different time series normalization approaches. This paper seeks to fill this gap by conducting a thorough performance evaluation of k-means time series clustering on real-world open-source time series datasets. The evaluation focuses on two distinct normalization techniques: z-normalization and NP-Free. The former is one of the most commonly used normalization approach for time series. The latter is a real-time time series representation approach, which can serve as a time series normalization approach. The primary objective of this paper is to assess the impact of these two normalization techniques on k-means time series clustering in terms of its clustering quality. The experiments employ the silhouette score, a well-established metric for evaluating the quality of clusters in a dataset. By systematically investigating the performance of k-means time series clustering with these two normalization techniques, this paper addresses the current gap in k-means time series clustering evaluation and contributes valuable insights to the development of time series clustering.
- [1594] arXiv:2401.15803 (cross-list from cs.RO) [ pdf , other ]
-
Title: GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist WorkflowAuthors: Liguo Zhou , Yinglei Song , Yichao Gao , Zhou Yu , Michael Sodamin , Hongshen Liu , Liang Ma , Lian Liu , Hao Liu , Yang Liu , Haichuan Li , Guang Chen , Alois KnollSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
Abstract: Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and interdisciplinary researchers. We introduce an autonomous driving simulator with photorealistic scenes, meanwhile keeping a user-friendly workflow. The simulator is able to communicate with external algorithms through ROS2 or this http URL , making it compatible with existing software stacks. Furthermore, we implement a highly accurate vehicle dynamics model within the simulator to enhance the realism of the vehicle's physical effects. The simulator is able to serve various functions, including generating synthetic data and driving with machine learning-based algorithms. Moreover, we prioritize simplicity in the deployment process, ensuring that beginners find it approachable and user-friendly.
- [1595] arXiv:2401.15810 (cross-list from cs.SE) [ pdf , other ]
-
Title: Green Runner: A tool for efficient deep learning component selectionAuthors: Jai KannanSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: For software that relies on machine-learned functionality, model selection is key to finding the right model for the task with desired performance characteristics. Evaluating a model requires developers to i) select from many models (e.g. the Hugging face model repository), ii) select evaluation metrics and training strategy, and iii) tailor trade-offs based on the problem domain. However, current evaluation approaches are either ad-hoc resulting in sub-optimal model selection or brute force leading to wasted compute. In this work, we present \toolname, a novel tool to automatically select and evaluate models based on the application scenario provided in natural language. We leverage the reasoning capabilities of large language models to propose a training strategy and extract desired trade-offs from a problem description. \toolname~features a resource-efficient experimentation engine that integrates constraints and trade-offs based on the problem into the model selection process. Our preliminary evaluation demonstrates that \toolname{} is both efficient and accurate compared to ad-hoc evaluations and brute force. This work presents an important step toward energy-efficient tools to help reduce the environmental impact caused by the growing demand for software with machine-learned functionality.
- [1596] arXiv:2401.15820 (cross-list from cs.CV) [ pdf , other ]
-
Title: Knowledge-Aware Neuron Interpretation for Scene ClassificationComments: Accepted to AAAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Although neural models have achieved remarkable performance, they still encounter doubts due to the intransparency. To this end, model prediction explanation is attracting more and more attentions. However, current methods rarely incorporate external knowledge and still suffer from three limitations: (1) Neglecting concept completeness. Merely selecting concepts may not sufficient for prediction. (2) Lacking concept fusion. Failure to merge semantically-equivalent concepts. (3) Difficult in manipulating model behavior. Lack of verification for explanation on original model. To address these issues, we propose a novel knowledge-aware neuron interpretation framework to explain model predictions for image scene classification. Specifically, for concept completeness, we present core concepts of a scene based on knowledge graph, ConceptNet, to gauge the completeness of concepts. Our method, incorporating complete concepts, effectively provides better prediction explanations compared to baselines. Furthermore, for concept fusion, we introduce a knowledge graph-based method known as Concept Filtering, which produces over 23% point gain on neuron behaviors for neuron interpretation. At last, we propose Model Manipulation, which aims to study whether the core concepts based on ConceptNet could be employed to manipulate model behavior. The results show that core concepts can effectively improve the performance of original model by over 26%.
- [1597] arXiv:2401.15834 (cross-list from cs.CV) [ pdf , other ]
-
Title: Few and Fewer: Learning Better from Few Examples Using Fewer Base ClassesAuthors: Raphael Lafargue , Yassir Bendou , Bastien Pasdeloup , Jean-Philippe Diguet , Ian Reid , Vincent Gripon , Jack ValmadreComments: 9.5 pages + bibliography and supplementary materialSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: When training data is scarce, it is common to make use of a feature extractor that has been pre-trained on a large base dataset, either by fine-tuning its parameters on the ``target'' dataset or by directly adopting its representation as features for a simple classifier. Fine-tuning is ineffective for few-shot learning, since the target dataset contains only a handful of examples. However, directly adopting the features without fine-tuning relies on the base and target distributions being similar enough that these features achieve separability and generalization. This paper investigates whether better features for the target dataset can be obtained by training on fewer base classes, seeking to identify a more useful base dataset for a given task.We consider cross-domain few-shot image classification in eight different domains from Meta-Dataset and entertain multiple real-world settings (domain-informed, task-informed and uninformed) where progressively less detail is known about the target task. To our knowledge, this is the first demonstration that fine-tuning on a subset of carefully selected base classes can significantly improve few-shot learning. Our contributions are simple and intuitive methods that can be implemented in any few-shot solution. We also give insights into the conditions in which these solutions are likely to provide a boost in accuracy. We release the code to reproduce all experiments from this paper on GitHub. this https URL
- [1598] arXiv:2401.15842 (cross-list from cs.CV) [ pdf , ps , other ]
-
Title: LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question AnsweringComments: 21 pages,9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, the LCV2 modular method is proposed for the Grounded Visual Question Answering task in the vision-language multimodal domain. This approach relies on a frozen large language model (LLM) as intermediate mediator between the off-the-shelf VQA model and the off-the-shelf visual grounding (VG) model, where the LLM transforms and conveys textual information between the two modules based on a designed prompt. LCV2 establish an integrated plug-and-play framework without the need for any pre-training process. This framework can be deployed for VQA Grounding tasks under low computational resources. The modularized model within the framework allows application with various state-of-the-art pre-trained models, exhibiting significant potential to be advance with the times. Experimental implementations were conducted under constrained computational and memory resources, evaluating the proposed method's performance on benchmark datasets including GQA, CLEVR, and VizWiz-VQA-Grounding. Comparative analyses with baseline methods demonstrate the robust competitiveness of LCV2.
- [1599] arXiv:2401.15847 (cross-list from cs.CV) [ pdf , other ]
-
Title: Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQAAuthors: Yue Fan , Jing Gu , Kaiwen Zhou , Qianqi Yan , Shan Jiang , Ching-Chen Kuo , Xinze Guan , Xin Eric WangSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanced multimodal AI applications, such as agents that understand complex scenes and navigate through webpages, the skill of multipanel visual reasoning is essential, and a comprehensive evaluation of models in this regard is important. Therefore, we introduce Multipanel Visual Question Answering (MultipanelVQA), a novel benchmark comprising 6,600 triplets of questions, answers, and multipanel images that specifically challenge models in comprehending multipanel images. Our evaluation shows that questions in the MultipanelVQA benchmark pose significant challenges to the state-of-the-art Large Vision Language Models (LVLMs) tested, even though humans can attain approximately 99\% accuracy on these questions. Distinctively, the MultipanelVQA benchmark features synthetically generated multipanel images specifically crafted to isolate and assess the impact of various factors, such as the layout, on LVLMs' multipanel image comprehension abilities. As a result, in addition to benchmarking the capabilities of LVLMs in understanding multipanel images, we analyze the potential causes for LVLMs' performance and offer insights for enhancement with the synthetic data. Code and data are released at this https URL .
- [1600] arXiv:2401.15856 (cross-list from cs.LG) [ pdf , other ]
-
Title: Look Around! Unexpected gains from training on environments in the vicinity of the targetAuthors: Serena Bono , Spandan Madan , Ishaan Grover , Mao Yasueda , Cynthia Breazeal , Hanspeter Pfister , Gabriel KreimanSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Solutions to Markov Decision Processes (MDP) are often very sensitive to state transition probabilities. As the estimation of these probabilities is often inaccurate in practice, it is important to understand when and how Reinforcement Learning (RL) agents generalize when transition probabilities change. Here we present a new methodology to evaluate such generalization of RL agents under small shifts in the transition probabilities. Specifically, we evaluate agents in new environments (MDPs) in the vicinity of the training MDP created by adding quantifiable, parametric noise into the transition function of the training MDP. We refer to this process as Noise Injection, and the resulting environments as $\delta$-environments. This process allows us to create controlled variations of the same environment with the level of the noise serving as a metric of distance between environments. Conventional wisdom suggests that training and testing on the same MDP should yield the best results. However, we report several cases of the opposite -- when targeting a specific environment, training the agent in an alternative noise setting can yield superior outcomes. We showcase this phenomenon across $60$ different variations of ATARI games, including PacMan, Pong, and Breakout.
- [1601] arXiv:2401.15859 (cross-list from cs.CV) [ pdf , other ]
-
Title: Diffusion Facial Forgery DetectionComments: The dataset will be released at \url{ this https URL }Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Detecting diffusion-generated images has recently grown into an emerging research area. Existing diffusion-based datasets predominantly focus on general image generation. However, facial forgeries, which pose a more severe social risk, have remained less explored thus far. To address this gap, this paper introduces DiFF, a comprehensive dataset dedicated to face-focused diffusion-generated images. DiFF comprises over 500,000 images that are synthesized using thirteen distinct generation methods under four conditions. In particular, this dataset leverages 30,000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency. We conduct extensive experiments on the DiFF dataset via a human test and several representative forgery detection methods. The results demonstrate that the binary detection accuracy of both human observers and automated detectors often falls below 30%, shedding light on the challenges in detecting diffusion-generated facial forgeries. Furthermore, we propose an edge graph regularization approach to effectively enhance the generalization capability of existing detectors.
- [1602] arXiv:2401.15861 (cross-list from cs.CL) [ pdf , other ]
-
Title: BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretrainingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: BERT (Bidirectional Encoder Representations from Transformers) has revolutionized the field of natural language processing through its exceptional performance on numerous tasks. Yet, the majority of researchers have mainly concentrated on enhancements related to the model structure, such as relative position embedding and more efficient attention mechanisms. Others have delved into pretraining tricks associated with Masked Language Modeling, including whole word masking. DeBERTa introduced an enhanced decoder adapted for BERT's encoder model for pretraining, proving to be highly effective. We argue that the design and research around enhanced masked language modeling decoders have been underappreciated. In this paper, we propose several designs of enhanced decoders and introduce BPDec (BERT Pretraining Decoder), a novel method for modeling training. Typically, a pretrained BERT model is fine-tuned for specific Natural Language Understanding (NLU) tasks. In our approach, we utilize the original BERT model as the encoder, making only changes to the decoder without altering the encoder. This approach does not necessitate extensive modifications to the model's architecture and can be seamlessly integrated into existing fine-tuning pipelines and services, offering an efficient and effective enhancement strategy. Compared to other methods, while we also incur a moderate training cost for the decoder during the pretraining process, our approach does not introduce additional training costs during the fine-tuning phase. We test multiple enhanced decoder structures after pretraining and evaluate their performance on the GLUE benchmark. Our results demonstrate that BPDec, having only undergone subtle refinements to the model structure during pretraining, significantly enhances model performance without escalating the inference time and serving budget.
- [1603] arXiv:2401.15863 (cross-list from cs.CV) [ pdf , other ]
-
Title: Importance-Aware Adaptive Dataset DistillationComments: Published as a journal paper in Elsevier Neural NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Herein, we propose a novel dataset distillation method for constructing small informative datasets that preserve the information of the large original datasets. The development of deep learning models is enabled by the availability of large-scale datasets. Despite unprecedented success, large-scale datasets considerably increase the storage and transmission costs, resulting in a cumbersome model training process. Moreover, using raw data for training raises privacy and copyright concerns. To address these issues, a new task named dataset distillation has been introduced, aiming to synthesize a compact dataset that retains the essential information from the large original dataset. State-of-the-art (SOTA) dataset distillation methods have been proposed by matching gradients or network parameters obtained during training on real and synthetic datasets. The contribution of different network parameters to the distillation process varies, and uniformly treating them leads to degraded distillation performance. Based on this observation, we propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance by automatically assigning importance weights to different network parameters during distillation, thereby synthesizing more robust distilled datasets. IADD demonstrates superior performance over other SOTA dataset distillation methods based on parameter matching on multiple benchmark datasets and outperforms them in terms of cross-architecture generalization. In addition, the analysis of self-adaptive weights demonstrates the effectiveness of IADD. Furthermore, the effectiveness of IADD is validated in a real-world medical application such as COVID-19 detection.
- [1604] arXiv:2401.15872 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Deep Q-Network Based on Radial Basis Functions for Multi-Echelon Inventory ManagementSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Abstract: This paper addresses a multi-echelon inventory management problem with a complex network topology where deriving optimal ordering decisions is difficult. Deep reinforcement learning (DRL) has recently shown potential in solving such problems, while designing the neural networks in DRL remains a challenge. In order to address this, a DRL model is developed whose Q-network is based on radial basis functions. The approach can be more easily constructed compared to classic DRL models based on neural networks, thus alleviating the computational burden of hyperparameter tuning. Through a series of simulation experiments, the superior performance of this approach is demonstrated compared to the simple base-stock policy, producing a better policy in the multi-echelon system and competitive performance in the serial system where the base-stock policy is optimal. In addition, the approach outperforms current DRL approaches.
- [1605] arXiv:2401.15894 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Gated MLP Architecture for Learning Topological Dependencies in Spatio-Temporal GraphsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph Neural Networks (GNNs) and Transformer have been increasingly adopted to learn the complex vector representations of spatio-temporal graphs, capturing intricate spatio-temporal dependencies crucial for applications such as traffic datasets. Although many existing methods utilize multi-head attention mechanisms and message-passing neural networks (MPNNs) to capture both spatial and temporal relations, these approaches encode temporal and spatial relations independently, and reflect the graph's topological characteristics in a limited manner. In this work, we introduce the Cycle to Mixer (Cy2Mixer), a novel spatio-temporal GNN based on topological non-trivial invariants of spatio-temporal graphs with gated multi-layer perceptrons (gMLP). The Cy2Mixer is composed of three blocks based on MLPs: A message-passing block for encapsulating spatial information, a cycle message-passing block for enriching topological information through cyclic subgraphs, and a temporal block for capturing temporal properties. We bolster the effectiveness of Cy2Mixer with mathematical evidence emphasizing that our cycle message-passing block is capable of offering differentiated information to the deep learning model compared to the message-passing block. Furthermore, empirical evaluations substantiate the efficacy of the Cy2Mixer, demonstrating state-of-the-art performances across various traffic benchmark datasets.
- [1606] arXiv:2401.15896 (cross-list from cs.CV) [ pdf , other ]
-
Title: M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient PretrainingAuthors: Qingpei Guo , Furong Xu , Hanxiao Zhang , Wang Ren , Ziping Ma , Lin Ju , Jian Wang , Jingdong Chen , Ming YangSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretraining datasets. Toward this end, we introduce a comprehensive bilingual (Chinese-English) dataset BM-6B with over 6 billion image-text pairs, aimed at enhancing multimodal foundation models to well understand images in both languages. To handle such a scale of dataset, we propose a novel grouped aggregation approach for image-text contrastive loss computation, which reduces the communication overhead and GPU memory demands significantly, facilitating a 60% increase in training speed. We pretrain a series of bilingual image-text foundation models with an enhanced fine-grained understanding ability on BM-6B, the resulting models, dubbed as $M^2$-Encoders (pronounced "M-Square"), set new benchmarks in both languages for multimodal retrieval and classification tasks. Notably, Our largest $M^2$-Encoder-10B model has achieved top-1 accuracies of 88.5% on ImageNet and 80.7% on ImageNet-CN under a zero-shot classification setting, surpassing previously reported SoTA methods by 2.2% and 21.1%, respectively. The $M^2$-Encoder series represents one of the most comprehensive bilingual image-text foundation models to date, so we are making it available to the research community for further exploration and development.
- [1607] arXiv:2401.15914 (cross-list from cs.CV) [ pdf , other ]
-
Title: Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD GeneralizationComments: ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks. However, such models mainly perform zero-shot recognition in a closed-set manner, and thus struggle to handle open-domain visual concepts by design. There are recent finetuning methods, such as prompt learning, that not only study the discrimination between in-distribution (ID) and out-of-distribution (OOD) samples, but also show some improvements in both ID and OOD accuracies. In this paper, we first demonstrate that vision-language models, after long enough finetuning but without proper regularization, tend to overfit the known classes in the given dataset, with degraded performance on unknown classes. Then we propose a novel approach OGEN to address this pitfall, with the main focus on improving the OOD GENeralization of finetuned models. Specifically, a class-conditional feature generator is introduced to synthesize OOD features using just the class name of any unknown class. Such synthesized features will provide useful knowledge about unknowns and help regularize the decision boundary between ID and OOD data when optimized jointly. Equally important is our adaptive self-distillation mechanism to regularize our feature generation model during joint optimization, i.e., adaptively transferring knowledge between model states to further prevent overfitting. Experiments validate that our method yields convincing gains in OOD generalization performance in different settings. Code: this https URL .
- [1608] arXiv:2401.15931 (cross-list from cs.NE) [ pdf , other ]
-
Title: EmoDM: A Diffusion Model for Evolutionary Multi-objective OptimizationSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Evolutionary algorithms have been successful in solving multi-objective optimization problems (MOPs). However, as a class of population-based search methodology, evolutionary algorithms require a large number of evaluations of the objective functions, preventing them from being applied to a wide range of expensive MOPs. To tackle the above challenge, this work proposes for the first time a diffusion model that can learn to perform evolutionary multi-objective search, called EmoDM. This is achieved by treating the reversed convergence process of evolutionary search as the forward diffusion and learn the noise distributions from previously solved evolutionary optimization tasks. The pre-trained EmoDM can then generate a set of non-dominated solutions for a new MOP by means of its reverse diffusion without further evolutionary search, thereby significantly reducing the required function evaluations. To enhance the scalability of EmoDM, a mutual entropy-based attention mechanism is introduced to capture the decision variables that are most important for the objectives. Experimental results demonstrate the competitiveness of EmoDM in terms of both the search performance and computational efficiency compared with state-of-the-art evolutionary algorithms in solving MOPs having up to 5000 decision variables. The pre-trained EmoDM is shown to generalize well to unseen problems, revealing its strong potential as a general and efficient MOP solver.
- [1609] arXiv:2401.15935 (cross-list from cs.LG) [ pdf , other ]
-
Title: Self-Supervised Learning in Event Sequences: A Comparative Study and Hybrid Approach of Generative Modeling and Contrastive LearningAuthors: Viktor Moskvoretskii , Dmitry Osin , Egor Shvetsov , Igor Udovichenko , Maxim Zhelnin , Andrey Dukhovny , Anna Zhimerikina , Albert Efimov , Evgeny BurnaevComments: 11 pages, 9 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This study investigates self-supervised learning techniques to obtain representations of Event Sequences. It is a key modality in various applications, including but not limited to banking, e-commerce, and healthcare.
We perform a comprehensive study of generative and contrastive approaches in self-supervised learning, applying them both independently. We find that there is no single supreme method. Consequently, we explore the potential benefits of combining these approaches. To achieve this goal, we introduce a novel method that aligns generative and contrastive embeddings as distinct modalities, drawing inspiration from contemporary multimodal research.
Generative and contrastive approaches are often treated as mutually exclusive, leaving a gap for their combined exploration. Our results demonstrate that this aligned model performs at least on par with, and mostly surpasses, existing methods and is more universal across a variety of tasks. Furthermore, we demonstrate that self-supervised methods consistently outperform the supervised approach on our datasets. - [1610] arXiv:2401.15944 (cross-list from cs.CV) [ pdf , other ]
-
Title: Bridging the Domain Gap: A Simple Domain Matching Method for Reference-based Image Super-Resolution in Remote SensingComments: Accepted to IEEE GRSL 2023Journal-ref: Volume: 21, Year: 2023, Page: 1-5Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recently, reference-based image super-resolution (RefSR) has shown excellent performance in image super-resolution (SR) tasks. The main idea of RefSR is to utilize additional information from the reference (Ref) image to recover the high-frequency components in low-resolution (LR) images. By transferring relevant textures through feature matching, RefSR models outperform existing single image super-resolution (SISR) models. However, their performance significantly declines when a domain gap between Ref and LR images exists, which often occurs in real-world scenarios, such as satellite imaging. In this letter, we introduce a Domain Matching (DM) module that can be seamlessly integrated with existing RefSR models to enhance their performance in a plug-and-play manner. To the best of our knowledge, we are the first to explore Domain Matching-based RefSR in remote sensing image processing. Our analysis reveals that their domain gaps often occur in different satellites, and our model effectively addresses these challenges, whereas existing models struggle. Our experiments demonstrate that the proposed DM module improves SR performance both qualitatively and quantitatively for remote sensing super-resolution tasks.
- [1611] arXiv:2401.15952 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Class-aware Optimal Transport Approach with Higher-Order Moment Matching for Unsupervised Domain AdaptationComments: 18 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. In this paper, we introduce a novel approach called class-aware optimal transport (OT), which measures the OT distance between a distribution over the source class-conditional distributions and a mixture of source and target data distribution. Our class-aware OT leverages a cost function that determines the matching extent between a given data example and a source class-conditional distribution. By optimizing this cost function, we find the optimal matching between target examples and source class-conditional distributions, effectively addressing the data and label shifts that occur between the two domains. To handle the class-aware OT efficiently, we propose an amortization solution that employs deep neural networks to formulate the transportation probabilities and the cost function. Additionally, we propose minimizing class-aware Higher-order Moment Matching (HMM) to align the corresponding class regions on the source and target domains. The class-aware HMM component offers an economical computational approach for accurately evaluating the HMM distance between the two distributions. Extensive experiments on benchmark datasets demonstrate that our proposed method significantly outperforms existing state-of-the-art baselines.
- [1612] arXiv:2401.15957 (cross-list from cs.LG) [ pdf , other ]
-
Title: Scalable Federated Unlearning via Isolated and Coded ShardingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: Federated unlearning has emerged as a promising paradigm to erase the client-level data effect without affecting the performance of collaborative learning models. However, the federated unlearning process often introduces extensive storage overhead and consumes substantial computational resources, thus hindering its implementation in practice. To address this issue, this paper proposes a scalable federated unlearning framework based on isolated sharding and coded computing. We first divide distributed clients into multiple isolated shards across stages to reduce the number of clients being affected. Then, to reduce the storage overhead of the central server, we develop a coded computing mechanism by compressing the model parameters across different shards. In addition, we provide the theoretical analysis of time efficiency and storage effectiveness for the isolated and coded sharding. Finally, extensive experiments on two typical learning tasks, i.e., classification and generation, demonstrate that our proposed framework can achieve better performance than three state-of-the-art frameworks in terms of accuracy, retraining time, storage overhead, and F1 scores for resisting membership inference attacks.
- [1613] arXiv:2401.15963 (cross-list from cs.SE) [ pdf , other ]
-
Title: NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional CorrectnessComments: PreprintSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Existing evaluation benchmarks of language models of code (code LMs) focus almost exclusively on whether the LMs can generate functionally-correct code. In real-world software engineering, developers think beyond functional correctness. They have requirements on "how" a functionality should be implemented to meet overall system design objectives like efficiency, security, and maintainability. They would also trust the code LMs more if the LMs demonstrate robust understanding of requirements and code semantics.
We propose a new benchmark NoFunEval to evaluate code LMs on non-functional requirements and simple classification instances for both functional and non-functional requirements. We propose a prompting method, Coding Concepts (CoCo), as a way for a developer to communicate the domain knowledge to the LMs. We conduct an extensive evaluation of twenty-two code LMs. Our finding is that they generally falter when tested on our benchmark, hinting at fundamental blindspots in their training setups. Surprisingly, even the classification accuracy on functional-correctness instances derived from the popular HumanEval benchmark is low, calling in question the depth of their comprehension and the source of their success in generating functionally-correct code in the first place. We will release our benchmark and evaluation scripts publicly at this https URL . - [1614] arXiv:2401.15966 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic QuestioningAuthors: Kenta Izumi , Hiroki Tanaka , Kazuhiro Shidara , Hiroyoshi Adachi , Daisuke Kanayama , Takashi Kudo , Satoshi NakamuraComments: Accepted by IWSDS2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Dialogue systems controlled by predefined or rule-based scenarios derived from counseling techniques, such as cognitive behavioral therapy (CBT), play an important role in mental health apps. Despite the need for responsible responses, it is conceivable that using the newly emerging LLMs to generate contextually relevant utterances will enhance these apps. In this study, we construct dialogue modules based on a CBT scenario focused on conventional Socratic questioning using two kinds of LLMs: a Transformer-based dialogue model further trained with a social media empathetic counseling dataset, provided by Osaka Prefecture (OsakaED), and GPT-4, a state-of-the art LLM created by OpenAI. By comparing systems that use LLM-generated responses with those that do not, we investigate the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality (e.g., empathy). As a result, no notable improvements are observed when using the OsakaED model. When using GPT-4, the amount of mood change, empathy, and other dialogue qualities improve significantly. Results suggest that GPT-4 possesses a high counseling ability. However, they also indicate that even when using a dialogue model trained with a human counseling dataset, it does not necessarily yield better outcomes compared to scenario-based dialogues. While presenting LLM-generated responses, including GPT-4, and having them interact directly with users in real-life mental health care services may raise ethical issues, it is still possible for human professionals to produce example responses or response templates using LLMs in advance in systems that use rules, scenarios, or example responses.
- [1615] arXiv:2401.15969 (cross-list from cs.CV) [ pdf , other ]
-
Title: Routers in Vision Mixture of Experts: An Empirical StudySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Mixture-of-Experts (MoE) models are a promising way to scale up model capacity without significantly increasing computational cost. A key component of MoEs is the router, which decides which subset of parameters (experts) process which feature embeddings (tokens). In this paper, we present a comprehensive study of routers in MoEs for computer vision tasks. We introduce a unified MoE formulation that subsumes different MoEs with two parametric routing tensors. This formulation covers both sparse MoE, which uses a binary or hard assignment between experts and tokens, and soft MoE, which uses a soft assignment between experts and weighted combinations of tokens. Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert. We conduct head-to-head experiments with 6 different routers, including existing routers from prior work and new ones we introduce. We show that (i) many routers originally developed for language modeling can be adapted to perform strongly in vision tasks, (ii) in sparse MoE, Expert Choice routers generally outperform Token Choice routers, and (iii) soft MoEs generally outperform sparse MoEs with a fixed compute budget. These results provide new insights regarding the crucial role of routers in vision MoE models.
- [1616] arXiv:2401.15970 (cross-list from cs.CR) [ pdf , other ]
-
Title: HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private InferenceSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Secure two-party computation with homomorphic encryption (HE) protects data privacy with a formal security guarantee but suffers from high communication overhead. While previous works, e.g., Cheetah, Iron, etc, have proposed efficient HE-based protocols for different neural network (NN) operations, they still assume high precision, e.g., fixed point 37 bit, for the NN operations and ignore NNs' native robustness against quantization error. In this paper, we propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. We observe the benefit of a naive combination of quantization and HE quickly saturates as bit precision goes down. Hence, to further improve communication efficiency, we propose a series of optimizations, including an intra-coefficient packing algorithm and a quantization-aware tiling algorithm, to simultaneously reduce the number and precision of the transferred data. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves $3.5\sim 23.4\times$ communication reduction and $3.0\sim 9.3\times$ latency reduction. Meanwhile, when compared with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant also achieves $3.1\sim 3.6\times$ communication reduction.
- [1617] arXiv:2401.16011 (cross-list from cs.LG) [ pdf , other ]
-
Title: GPS: Graph Contrastive Learning via Multi-scale Augmented Views from Adversarial PoolingAuthors: Wei Ju , Yiyang Gu , Zhengyang Mao , Ziyue Qiao , Yifang Qin , Xiao Luo , Hui Xiong , Ming ZhangComments: Accepted by SCIENCE CHINA Information Sciences (SCIS 2024)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Self-supervised graph representation learning has recently shown considerable promise in a range of fields, including bioinformatics and social networks. A large number of graph contrastive learning approaches have shown promising performance for representation learning on graphs, which train models by maximizing agreement between original graphs and their augmented views (i.e., positive views). Unfortunately, these methods usually involve pre-defined augmentation strategies based on the knowledge of human experts. Moreover, these strategies may fail to generate challenging positive views to provide sufficient supervision signals. In this paper, we present a novel approach named Graph Pooling ContraSt (GPS) to address these issues. Motivated by the fact that graph pooling can adaptively coarsen the graph with the removal of redundancy, we rethink graph pooling and leverage it to automatically generate multi-scale positive views with varying emphasis on providing challenging positives and preserving semantics, i.e., strongly-augmented view and weakly-augmented view. Then, we incorporate both views into a joint contrastive learning framework with similarity learning and consistency learning, where our pooling module is adversarially trained with respect to the encoder for adversarial robustness. Experiments on twelve datasets on both graph classification and transfer learning tasks verify the superiority of the proposed method over its counterparts.
- [1618] arXiv:2401.16013 (cross-list from cs.RO) [ pdf , other ]
-
Title: SERL: A Software Suite for Sample-Efficient Robotic Reinforcement LearningAuthors: Jianlan Luo , Zheyuan Hu , Charles Xu , You Liang Tan , Jacob Berg , Archit Sharma , Stefan Schaal , Chelsea Finn , Abhishek Gupta , Sergey LevineComments: ICRA 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, significant progress has been made in the field of robotic reinforcement learning (RL), enabling methods that handle complex image observations, train in the real world, and incorporate auxiliary data, such as demonstrations and prior experience. However, despite these advances, robotic RL remains hard to use. It is acknowledged among practitioners that the particular implementation details of these algorithms are often just as important (if not more so) for performance as the choice of algorithm. We posit that a significant challenge to widespread adoption of robotic RL, as well as further development of robotic RL methods, is the comparative inaccessibility of such methods. To address this challenge, we developed a carefully implemented library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment, a high-quality controller for a widely-adopted robot, and a number of challenging example tasks. We provide this library as a resource for the community, describe its design choices, and present experimental results. Perhaps surprisingly, we find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation between 25 to 50 minutes of training per policy on average, improving over state-of-the-art results reported for similar tasks in the literature. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent recovery and correction behaviors. We hope that these promising results and our high-quality open-source implementation will provide a tool for the robotics community to facilitate further developments in robotic RL. Our code, documentation, and videos can be found at this https URL
- [1619] arXiv:2401.16024 (cross-list from cs.LG) [ pdf , other ]
-
Title: Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic ArchitecturesComments: Accepted in NeurIPS 2023 Workshop on MATH-AISubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Abstract reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge. This study focuses on efficiently solving Raven's progressive matrices (RPM), a visual test for assessing abstract reasoning abilities, by using distributed computation and operators provided by vector-symbolic architectures (VSA). Instead of hard-coding the rule formulations associated with RPMs, our approach can learn the VSA rule formulations (hence the name Learn-VRF) with just one pass through the training data. Yet, our approach, with compact parameters, remains transparent and interpretable. Learn-VRF yields accurate predictions on I-RAVEN's in-distribution data, and exhibits strong out-of-distribution capabilities concerning unseen attribute-rule pairs, significantly outperforming pure connectionist baselines including large language models. Our code is available at this https URL .
- [1620] arXiv:2401.16094 (cross-list from cs.LG) [ pdf , other ]
-
Title: Federated unsupervised random forest for privacy-preserving patient stratificationAuthors: Bastian Pfeifer , Christel Sirocchi , Marcus D. Bloice , Markus Kreuzthaler , Martin UrschlerSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Quantitative Methods (q-bio.QM)
Abstract: In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. This work establishes a powerful framework for advancing precision medicine through unsupervised random-forest-based clustering and federated computing. We introduce a novel multi-omics clustering approach utilizing unsupervised random-forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Moreover, our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark data sets as well as on cancer data from The Cancer Genome Atlas (TCGA). Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing.
- [1621] arXiv:2401.16102 (cross-list from cs.LG) [ pdf , other ]
-
Title: Flexible Parallel Neural Network Architecture Model for Early Prediction of Lithium Battery LifeSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: The early prediction of battery life (EPBL) is vital for enhancing the efficiency and extending the lifespan of lithium batteries. Traditional models with fixed architectures often encounter underfitting or overfitting issues due to the diverse data distributions in different EPBL tasks. An interpretable deep learning model of flexible parallel neural network (FPNN) is proposed, which includes an InceptionBlock, a 3D convolutional neural network (CNN), a 2D CNN, and a dual-stream network. The proposed model effectively extracts electrochemical features from video-like formatted data using the 3D CNN and achieves advanced multi-scale feature abstraction through the InceptionBlock. The FPNN can adaptively adjust the number of InceptionBlocks to flexibly handle tasks of varying complexity in EPBL. The test on the MIT dataset shows that the FPNN model achieves outstanding predictive accuracy in EPBL tasks, with MAPEs of 2.47%, 1.29%, 1.08%, and 0.88% when the input cyclic data volumes are 10, 20, 30, and 40, respectively. The interpretability of the FPNN is mainly reflected in its flexible unit structure and parameter selection: its diverse branching structure enables the model to capture features at different scales, thus allowing the machine to learn informative features. The approach presented herein provides an accurate, adaptable, and comprehensible solution for early life prediction of lithium batteries, opening new possibilities in the field of battery health monitoring.
- [1622] arXiv:2401.16107 (cross-list from cs.CL) [ pdf , other ]
-
Title: Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic DiagnosisSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Automatic diagnosis is a significant application of AI in healthcare, where diagnoses are generated based on the symptom description of patients. Previous works have approached this task directly by modeling the relationship between the normalized symptoms and all possible diseases. However, in the clinical diagnostic process, patients are initially consulted by a general practitioner and, if necessary, referred to specialists in specific domains for a more comprehensive evaluation. The final diagnosis often emerges from a collaborative consultation among medical specialist groups. Recently, large language models have shown impressive capabilities in natural language understanding. In this study, we adopt tuning-free LLM-based agents as medical practitioners and propose the Agent-derived Multi-Specialist Consultation (AMSC) framework to model the diagnosis process in the real world by adaptively fusing probability distributions of agents over potential diseases. Experimental results demonstrate the superiority of our approach compared with baselines. Notably, our approach requires significantly less parameter updating and training time, enhancing efficiency and practical utility. Furthermore, we delve into a novel perspective on the role of implicit symptoms within the context of automatic diagnosis.
- [1623] arXiv:2401.16123 (cross-list from cs.HC) [ pdf , other ]
-
Title: Looking for a better fit? An Incremental Learning Multimodal Object Referencing Framework adapting to Individual DriversComments: Accepted for publication in the Proceedings of the 29th International Conference on Intelligent User Interfaces (IUI'24), March 18--21, 2024, in Greenville, SC, USASubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: The rapid advancement of the automotive industry towards automated and semi-automated vehicles has rendered traditional methods of vehicle interaction, such as touch-based and voice command systems, inadequate for a widening range of non-driving related tasks, such as referencing objects outside of the vehicle. Consequently, research has shifted toward gestural input (e.g., hand, gaze, and head pose gestures) as a more suitable mode of interaction during driving. However, due to the dynamic nature of driving and individual variation, there are significant differences in drivers' gestural input performance. While, in theory, this inherent variability could be moderated by substantial data-driven machine learning models, prevalent methodologies lean towards constrained, single-instance trained models for object referencing. These models show a limited capacity to continuously adapt to the divergent behaviors of individual drivers and the variety of driving scenarios. To address this, we propose \textit{IcRegress}, a novel regression-based incremental learning approach that adapts to changing behavior and the unique characteristics of drivers engaged in the dual task of driving and referencing objects. We suggest a more personalized and adaptable solution for multimodal gestural interfaces, employing continuous lifelong learning to enhance driver experience, safety, and convenience. Our approach was evaluated using an outside-the-vehicle object referencing use case, highlighting the superiority of the incremental learning models adapted over a single trained model across various driver traits such as handedness, driving experience, and numerous driving conditions. Finally, to facilitate reproducibility, ease deployment, and promote further research, we offer our approach as an open-source framework at \url{ this https URL }.
- [1624] arXiv:2401.16136 (cross-list from cs.CR) [ pdf , other ]
-
Title: Neural Network Training on Encrypted Data with TFHESubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: We present an approach to outsourcing of training neural networks while preserving data confidentiality from malicious parties. We use fully homomorphic encryption to build a unified training approach that works on encrypted data and learns quantized neural network models. The data can be horizontally or vertically split between multiple parties, enabling collaboration on confidential data. We train logistic regression and multi-layer perceptrons on several datasets.
- [1625] arXiv:2401.16137 (cross-list from cs.LG) [ pdf , other ]
-
Title: X-PEFT: eXtremely Parameter-Efficient Fine-Tuning for Extreme Multi-Profile ScenariosSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Parameter-efficient fine-tuning (PEFT) techniques, such as adapter tuning, aim to fine-tune a pre-trained language model (PLM) using a minimal number of parameters for a specific task or profile. Although adapter tuning provides increased parameter efficiency compared to full-model fine-tuning, it introduces a small set of additional parameters attached to a PLM for each profile. This can become problematic in practical applications with multiple profiles, particularly when a significant increase in the number of profiles linearly boosts the total number of additional parameters. To mitigate this issue, we introduce X-PEFT, a novel PEFT method that leverages a multitude of given adapters by fine-tuning an extremely small set of compact tensors for a new profile, which serve as binary masks to adaptively select the given adapters. To efficiently validate our proposed method, we implement it using a large number of trained or untrained (random) adapters. We evaluate the performance of X-PEFT through LaMP and GLUE tasks and demonstrate that it either matches or surpasses the effectiveness of conventional adapter tuning, despite reducing the memory requirements per profile by a factor of 10,000 compared to it.
- [1626] arXiv:2401.16144 (cross-list from cs.CV) [ pdf , other ]
-
Title: Divide and Conquer: Rethinking the Training Paradigm of Neural Radiance FieldsAuthors: Rongkai Ma , Leo Lebrat , Rodrigo Santa Cruz , Gil Avraham , Yan Zuo , Clinton Fookes , Olivier SalvadoSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Neural radiance fields (NeRFs) have exhibited potential in synthesizing high-fidelity views of 3D scenes but the standard training paradigm of NeRF presupposes an equal importance for each image in the training set. This assumption poses a significant challenge for rendering specific views presenting intricate geometries, thereby resulting in suboptimal performance. In this paper, we take a closer look at the implications of the current training paradigm and redesign this for more superior rendering quality by NeRFs. Dividing input views into multiple groups based on their visual similarities and training individual models on each of these groups enables each model to specialize on specific regions without sacrificing speed or efficiency. Subsequently, the knowledge of these specialized models is aggregated into a single entity via a teacher-student distillation paradigm, enabling spatial efficiency for online render-ing. Empirically, we evaluate our novel training framework on two publicly available datasets, namely NeRF synthetic and Tanks&Temples. Our evaluation demonstrates that our DaC training pipeline enhances the rendering quality of a state-of-the-art baseline model while exhibiting convergence to a superior minimum.
- [1627] arXiv:2401.16157 (cross-list from cs.CV) [ pdf , other ]
-
Title: Spatial-Aware Latent Initialization for Controllable Image GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Recently, text-to-image diffusion models have demonstrated impressive ability to generate high-quality images conditioned on the textual input. However, these models struggle to accurately adhere to textual instructions regarding spatial layout information. While previous research has primarily focused on aligning cross-attention maps with layout conditions, they overlook the impact of the initialization noise on the layout guidance. To achieve better layout control, we propose leveraging a spatial-aware initialization noise during the denoising process. Specifically, we find that the inverted reference image with finite inversion steps contains valuable spatial awareness regarding the object's position, resulting in similar layouts in the generated images. Based on this observation, we develop an open-vocabulary framework to customize a spatial-aware initialization noise for each layout condition. Without modifying other modules except the initialization noise, our approach can be seamlessly integrated as a plug-and-play module within other training-free layout guidance frameworks. We evaluate our approach quantitatively and qualitatively on the available Stable Diffusion model and COCO dataset. Equipped with the spatial-aware latent initialization, our method significantly improves the effectiveness of layout guidance while preserving high-quality content.
- [1628] arXiv:2401.16176 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Survey on Structure-Preserving Graph TransformersComments: 12Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The transformer architecture has shown remarkable success in various domains, such as natural language processing and computer vision. When it comes to graph learning, transformers are required not only to capture the interactions between pairs of nodes but also to preserve graph structures connoting the underlying relations and proximity between them, showing the expressive power to capture different graph structures. Accordingly, various structure-preserving graph transformers have been proposed and widely used for various tasks, such as graph-level tasks in bioinformatics and chemoinformatics. However, strategies related to graph structure preservation have not been well organized and systematized in the literature. In this paper, we provide a comprehensive overview of structure-preserving graph transformers and generalize these methods from the perspective of their design objective. First, we divide strategies into four main groups: node feature modulation, context node sampling, graph rewriting, and transformer architecture improvements. We then further divide the strategies according to the coverage and goals of graph structure preservation. Furthermore, we also discuss challenges and future directions for graph transformer models to preserve the graph structure and understand the nature of graphs.
- [1629] arXiv:2401.16182 (cross-list from cs.CL) [ pdf , other ]
-
Title: LLaMandement: Large Language Models for Summarization of French Legislative ProposalsAuthors: Joseph Gesnouin , Yannis Tannier , Christophe Gomes Da Silva , Hatim Tapory , Camille Brier , Hugo Simon , Raphael Rozenberg , Hermann Woehrel , Mehdi El Yakaabi , Thomas Binder , Guillaume Marie , Emilie Caron , Mathile Nogueira , Thomas Fontas , Laure Puydebois , Marie Theophile , Stephane Morandi , Mael Petit , David Creissac , Pauline Ennouchy , Elise Valetoux , Celine Visade , Severine Balloux , Emmanuel Cortes , Pierre-Etienne Devineau , Ulrich Tan , Esther Mac Namara , Su YangComments: 21 pages, 9 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This report introduces LLaMandement, a state-of-the-art Large Language Model, fine-tuned by the French government and designed to enhance the efficiency and efficacy of processing parliamentary sessions (including the production of bench memoranda and documents required for interministerial meetings) by generating neutral summaries of legislative proposals. Addressing the administrative challenges of manually processing a growing volume of legislative amendments, LLaMandement stands as a significant legal technological milestone, providing a solution that exceeds the scalability of traditional human efforts while matching the robustness of a specialized legal drafter. We release all our fine-tuned models and training data to the community.
- [1630] arXiv:2401.16185 (cross-list from cs.CR) [ pdf , other ]
-
Title: LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability ReasoningAuthors: Yuqiang Sun , Daoyuan Wu , Yue Xue , Han Liu , Wei Ma , Lyuye Zhang , Miaolei Shi , Yang LiuComments: This is a technical report by Nanyang Technological UniversitySubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: Large language models (LLMs) have demonstrated significant potential for many downstream tasks, including those requiring human-level intelligence, such as vulnerability detection. However, recent attempts to use LLMs for vulnerability detection are still preliminary, as they lack an in-depth understanding of a subject LLM's vulnerability reasoning capability -- whether it originates from the model itself or from external assistance, such as invoking tool support and retrieving vulnerability knowledge. In this paper, we aim to decouple LLMs' vulnerability reasoning capability from their other capabilities, including the ability to actively seek additional information (e.g., via function calling in SOTA models), adopt relevant vulnerability knowledge (e.g., via vector-based matching and retrieval), and follow instructions to output structured results. To this end, we propose a unified evaluation framework named LLM4Vuln, which separates LLMs' vulnerability reasoning from their other capabilities and evaluates how LLMs' vulnerability reasoning could be enhanced when combined with the enhancement of other capabilities. To demonstrate the effectiveness of LLM4Vuln, we have designed controlled experiments using 75 ground-truth smart contract vulnerabilities, which were extensively audited as high-risk on Code4rena from August to November 2023, and tested them in 4,950 different scenarios across three representative LLMs (GPT-4, Mixtral, and Code Llama). Our results not only reveal ten findings regarding the varying effects of knowledge enhancement, context supplementation, prompt schemes, and models but also enable us to identify 9 zero-day vulnerabilities in two pilot bug bounty programs with over 1,000 USD being awarded.
- [1631] arXiv:2401.16186 (cross-list from cs.SE) [ pdf , other ]
-
Title: An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering ProjectComments: 8 pages, 6 figures, accepted for publication at the LLM4Code workshop @ ICSE 2024Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) represent a leap in artificial intelligence, excelling in tasks using human language(s). Although the main focus of general-purpose LLMs is not code generation, they have shown promising results in the domain. However, the usefulness of LLMs in an academic software engineering project has not been fully explored yet. In this study, we explore the usefulness of LLMs for 214 students working in teams consisting of up to six members. Notably, in the academic course through which this study is conducted, students were encouraged to integrate LLMs into their development tool-chain, in contrast to most other academic courses that explicitly prohibit the use of LLMs.
In this paper, we analyze the AI-generated code, prompts used for code generation, and the human intervention levels to integrate the code into the code base. We also conduct a perception study to gain insights into the perceived usefulness, influencing factors, and future outlook of LLM from a computer science student's perspective. Our findings suggest that LLMs can play a crucial role in the early stages of software development, especially in generating foundational code structures, and helping with syntax and error debugging. These insights provide us with a framework on how to effectively utilize LLMs as a tool to enhance the productivity of software engineering students, and highlight the necessity of shifting the educational focus toward preparing students for successful human-AI collaboration. - [1632] arXiv:2401.16198 (cross-list from cs.GT) [ pdf , other ]
-
Title: Contracting with a Learning AgentAuthors: Guru Guruganesh , Yoav Kolumbus , Jon Schneider , Inbal Talgam-Cohen , Emmanouil-Vasileios Vlatakis-Gkaragkounis , Joshua R. Wang , S. Matthew WeinbergSubjects: Computer Science and Game Theory (cs.GT) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Theoretical Economics (econ.TH)
Abstract: Many real-life contractual relations differ completely from the clean, static model at the heart of principal-agent theory. Typically, they involve repeated strategic interactions of the principal and agent, taking place under uncertainty and over time. While appealing in theory, players seldom use complex dynamic strategies in practice, often preferring to circumvent complexity and approach uncertainty through learning. We initiate the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes.
Optimizing against a no-regret agent is a known open problem in general games; we achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure. The solution has a surprisingly simple structure: for some $\alpha > 0$, initially offer the agent a linear contract with scalar $\alpha$, then switch to offering a linear contract with scalar $0$. This switch causes the agent to ``free-fall'' through their action space and during this time provides the principal with non-zero reward at zero cost. Despite apparent exploitation of the agent, this dynamic contract can leave \emph{both} players better off compared to the best static contract. Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically.
Finally, we quantify the dependence of our results on knowledge of the time horizon, and are the first to address this consideration in the study of strategizing against learning agents. - [1633] arXiv:2401.16209 (cross-list from cs.CL) [ pdf , other ]
-
Title: MultiMUC: Multilingual Template Filling on MUC-4Authors: William Gantt , Shabnam Behzad , Hannah YoungEun An , Yunmo Chen , Aaron Steven White , Benjamin Van Durme , Mahsa YarmohammadiComments: EACL 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We introduce MultiMUC, the first multilingual parallel corpus for template filling, comprising translations of the classic MUC-4 template filling benchmark into five languages: Arabic, Chinese, Farsi, Korean, and Russian. We obtain automatic translations from a strong multilingual machine translation system and manually project the original English annotations into each target language. For all languages, we also provide human translations for sentences in the dev and test splits that contain annotated template arguments. Finally, we present baselines on MultiMUC both with state-of-the-art template filling models and with ChatGPT.
- [1634] arXiv:2401.16215 (cross-list from cs.LG) [ pdf , other ]
-
Title: Learning big logical rules by joining small rulesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: A major challenge in inductive logic programming is learning big rules. To address this challenge, we introduce an approach where we join small rules to learn big rules. We implement our approach in a constraint-driven system and use constraint solvers to efficiently join rules. Our experiments on many domains, including game playing and drug design, show that our approach can (i) learn rules with more than 100 literals, and (ii) drastically outperform existing approaches in terms of predictive accuracies.
- [1635] arXiv:2401.16240 (cross-list from cs.CL) [ pdf , other ]
-
Title: Combining Hierachical VAEs with LLMs for clinically meaningful timeline summarisation in social mediaSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We introduce a hybrid abstractive summarisation approach combining hierarchical VAE with LLMs (LlaMA-2) to produce clinically meaningful summaries from social media user timelines, appropriate for mental health monitoring. The summaries combine two different narrative points of view: clinical insights in third person useful for a clinician are generated by feeding into an LLM specialised clinical prompts, and importantly, a temporally sensitive abstractive summary of the user's timeline in first person, generated by a novel hierarchical variational autoencoder, TH-VAE. We assess the generated summaries via automatic evaluation against expert summaries and via human evaluation with clinical experts, showing that timeline summarisation by TH-VAE results in more factual and logically coherent summaries rich in clinical utility and superior to LLM-only approaches in capturing changes over time.
- [1636] arXiv:2401.16251 (cross-list from cs.CR) [ pdf , other ]
-
Title: Cross-silo Federated Learning with Record-level Personalized Differential PrivacyComments: 12 pages, 7 figures, under reviewSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Federated learning enhanced by differential privacy has emerged as a popular approach to better safeguard the privacy of client-side data by protecting clients' contributions during the training process. Existing solutions typically assume a uniform privacy budget for all records and provide one-size-fits-all solutions that may not be adequate to meet each record's privacy requirement. In this paper, we explore the uncharted territory of cross-silo FL with record-level personalized differential privacy. We devise a novel framework named rPDP-FL, employing a two-stage hybrid sampling scheme with both client-level sampling and non-uniform record-level sampling to accommodate varying privacy requirements. A critical and non-trivial problem is to select the ideal per-record sampling probability q given the personalized privacy budget {\epsilon}. We introduce a versatile solution named Simulation-CurveFitting, allowing us to uncover a significant insight into the nonlinear correlation between q and {\epsilon} and derive an elegant mathematical model to tackle the problem. Our evaluation demonstrates that our solution can provide significant performance gains over the baselines that do not consider personalized privacy preservation.
- [1637] arXiv:2401.16258 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: MosquIoT: A System Based on IoT and Machine Learning for the Monitoring of Aedes aegypti (Diptera: Culicidae)Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
Abstract: Millions of people around the world are infected with mosquito-borne diseases each year. One of the most dangerous species is Aedes aegypti, the main vector of viruses such as dengue, yellow fever, chikungunya, and Zika, among others. Mosquito prevention and eradication campaigns are essential to avoid major public health consequences. In this respect, entomological surveillance is an important tool. At present, this traditional monitoring tool is executed manually and requires digital transformation to help authorities make better decisions, improve their planning efforts, speed up execution, and better manage available resources. Therefore, new technological tools based on proven techniques need to be designed and developed. However, such tools should also be cost-effective, autonomous, reliable, and easy to implement, and should be enabled by connectivity and multi-platform software applications. This paper presents the design, development, and testing of an innovative system named MosquIoT. It is based on traditional ovitraps with embedded Internet of Things (IoT) and Tiny Machine Learning (TinyML) technologies, which enable the detection and quantification of Ae. aegypti eggs. This innovative and promising solution may help dynamically understand the behavior of Ae. aegypti populations in cities, shifting from the current reactive entomological monitoring model to a proactive and predictive digital one.
- [1638] arXiv:2401.16282 (cross-list from cs.CL) [ pdf , other ]
-
Title: MAPLE: Micro Analysis of Pairwise Language Evolution for Few-Shot Claim VerificationComments: accepted by EACL Findings 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Claim verification is an essential step in the automated fact-checking pipeline which assesses the veracity of a claim against a piece of evidence. In this work, we explore the potential of few-shot claim verification, where only very limited data is available for supervision. We propose MAPLE (Micro Analysis of Pairwise Language Evolution), a pioneering approach that explores the alignment between a claim and its evidence with a small seq2seq model and a novel semantic measure. Its innovative utilization of micro language evolution path leverages unlabelled pairwise data to facilitate claim verification while imposing low demand on data annotations and computing resources. MAPLE demonstrates significant performance improvements over SOTA baselines SEED, PET and LLaMA 2 across three fact-checking datasets: FEVER, Climate FEVER, and SciFact. Data and code are available here: this https URL
- [1639] arXiv:2401.16285 (cross-list from cs.CL) [ pdf , other ]
-
Title: Capturing Pertinent Symbolic Features for Enhanced Content-Based Misinformation DetectionComments: Accepted at K-CAP'23: The 12th Knowledge Capture ConferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Preventing the spread of misinformation is challenging. The detection of misleading content presents a significant hurdle due to its extreme linguistic and domain variability. Content-based models have managed to identify deceptive language by learning representations from textual data such as social media posts and web articles. However, aggregating representative samples of this heterogeneous phenomenon and implementing effective real-world applications is still elusive. Based on analytical work on the language of misinformation, this paper analyzes the linguistic attributes that characterize this phenomenon and how representative of such features some of the most popular misinformation datasets are. We demonstrate that the appropriate use of pertinent symbolic knowledge in combination with neural language models is helpful in detecting misleading content. Our results achieve state-of-the-art performance in misinformation datasets across the board, showing that our approach offers a valid and robust alternative to multi-task transfer learning without requiring any additional training data. Furthermore, our results show evidence that structured knowledge can provide the extra boost required to address a complex and unpredictable real-world problem like misinformation detection, not only in terms of accuracy but also time efficiency and resource utilization.
- [1640] arXiv:2401.16293 (cross-list from cs.CL) [ pdf , other ]
-
Title: Textual Entailment for Effective Triple Validation in Object PredictionComments: Accepted to ISWC'23 - The International Semantic Web ConferenceSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Digital Libraries (cs.DL)
Abstract: Knowledge base population seeks to expand knowledge graphs with facts that are typically extracted from a text corpus. Recently, language models pretrained on large corpora have been shown to contain factual knowledge that can be retrieved using cloze-style strategies. Such approach enables zero-shot recall of facts, showing competitive results in object prediction compared to supervised baselines. However, prompt-based fact retrieval can be brittle and heavily depend on the prompts and context used, which may produce results that are unintended or hallucinatory.We propose to use textual entailment to validate facts extracted from language models through cloze statements. Our results show that triple validation based on textual entailment improves language model predictions in different training regimes. Furthermore, we show that entailment-based triple validation is also effective to validate candidate facts extracted from other sources including existing knowledge graphs and text passages where named entities are recognized.
- [1641] arXiv:2401.16294 (cross-list from cs.LG) [ pdf , other ]
-
Title: Dual feature-based and example-based explanation methodsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: A new approach to the local and global explanation is proposed. It is based on selecting a convex hull constructed for the finite number of points around an explained instance. The convex hull allows us to consider a dual representation of instances in the form of convex combinations of extreme points of a produced polytope. Instead of perturbing new instances in the Euclidean feature space, vectors of convex combination coefficients are uniformly generated from the unit simplex, and they form a new dual dataset. A dual linear surrogate model is trained on the dual dataset. The explanation feature importance values are computed by means of simple matrix calculations. The approach can be regarded as a modification of the well-known model LIME. The dual representation inherently allows us to get the example-based explanation. The neural additive model is also considered as a tool for implementing the example-based explanation approach. Many numerical experiments with real datasets are performed for studying the approach. The code of proposed algorithms is available.
- [1642] arXiv:2401.16298 (cross-list from cs.CV) [ pdf , other ]
-
Title: Breaking the Barrier: Selective Uncertainty-based Active Learning for Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Active learning (AL) has found wide applications in medical image segmentation, aiming to alleviate the annotation workload and enhance performance. Conventional uncertainty-based AL methods, such as entropy and Bayesian, often rely on an aggregate of all pixel-level metrics. However, in imbalanced settings, these methods tend to neglect the significance of target regions, eg., lesions, and tumors. Moreover, uncertainty-based selection introduces redundancy. These factors lead to unsatisfactory performance, and in many cases, even underperform random sampling. To solve this problem, we introduce a novel approach called the Selective Uncertainty-based AL, avoiding the conventional practice of summing up the metrics of all pixels. Through a filtering process, our strategy prioritizes pixels within target areas and those near decision boundaries. This resolves the aforementioned disregard for target areas and redundancy. Our method showed substantial improvements across five different uncertainty-based methods and two distinct datasets, utilizing fewer labeled data to reach the supervised baseline and consistently achieving the highest overall performance. Our code is available at this https URL \_Uncertainty\_AL.
- [1643] arXiv:2401.16299 (cross-list from cs.LG) [ pdf , other ]
-
Title: Enhancing Molecular Property Prediction with Auxiliary Learning and Task-Specific AdaptationSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Pretrained Graph Neural Networks have been widely adopted for various molecular property prediction tasks. Despite their ability to encode structural and relational features of molecules, traditional fine-tuning of such pretrained GNNs on the target task can lead to poor generalization. To address this, we explore the adaptation of pretrained GNNs to the target task by jointly training them with multiple auxiliary tasks. This could enable the GNNs to learn both general and task-specific features, which may benefit the target task. However, a major challenge is to determine the relatedness of auxiliary tasks with the target task. To address this, we investigate multiple strategies to measure the relevance of auxiliary tasks and integrate such tasks by adaptively combining task gradients or by learning task weights via bi-level optimization. Additionally, we propose a novel gradient surgery-based approach, Rotation of Conflicting Gradients ($\mathtt{RCGrad}$), that learns to align conflicting auxiliary task gradients through rotation. Our experiments with state-of-the-art pretrained GNNs demonstrate the efficacy of our proposed methods, with improvements of up to 7.7% over fine-tuning. This suggests that incorporating auxiliary tasks along with target task fine-tuning can be an effective way to improve the generalizability of pretrained GNNs for molecular property prediction.
- [1644] arXiv:2401.16310 (cross-list from cs.SE) [ pdf , other ]
-
Title: Security Code Review by LLMs: A Deep Dive into ResponsesAuthors: Jiaxin Yu , Peng Liang , Yujia Fu , Amjed Tahir , Mojtaba Shahin , Chong Wang , Yangxiao CaiSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Security code review aims to combine automated tools and manual efforts to detect security defects during development. The rapid development of Large Language Models (LLMs) has shown promising potential in software development, as well as opening up new possibilities in automated security code review. To explore the challenges of applying LLMs in practical code review for security defect detection, this study compared the detection performance of three state-of-the-art LLMs (Gemini Pro, GPT-4, and GPT-3.5) under five prompts on 549 code files that contain security defects from real-world code reviews. Through analyzing 82 responses generated by the best-performing LLM-prompt combination based on 100 randomly selected code files, we extracted and categorized quality problems present in these responses into 5 themes and 16 categories. Our results indicate that the responses produced by LLMs often suffer from verbosity, vagueness, and incompleteness, highlighting the necessity to enhance their conciseness, understandability, and compliance to security defect detection. This work reveals the deficiencies of LLM-generated responses in security code review and paves the way for future optimization of LLMs towards this task.
- [1645] arXiv:2401.16318 (cross-list from cs.LG) [ pdf , other ]
-
Title: Defining and Extracting generalizable interaction primitives from DNNsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Faithfully summarizing the knowledge encoded by a deep neural network (DNN) into a few symbolic primitive patterns without losing much information represents a core challenge in explainable AI. To this end, Ren et al. (2023c) have derived a series of theorems to prove that the inference score of a DNN can be explained as a small set of interactions between input variables. However, the lack of generalization power makes it still hard to consider such interactions as faithful primitive patterns encoded by the DNN. Therefore, given different DNNs trained for the same task, we develop a new method to extract interactions that are shared by these DNNs. Experiments show that the extracted interactions can better reflect common knowledge shared by different DNNs.
- [1646] arXiv:2401.16332 (cross-list from cs.CL) [ pdf , other ]
-
Title: Tradeoffs Between Alignment and Helpfulness in Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Language model alignment has become an important component of AI safety, allowing safe interactions between humans and language models, by enhancing desired behaviors and inhibiting undesired ones. It is often done by tuning the model or inserting preset aligning prompts. Recently, representation engineering, a method which alters the model's behavior via changing its representations post-training, was shown to be effective in aligning LLMs (Zou et al., 2023a). Representation engineering yields gains in alignment oriented tasks such as resistance to adversarial attacks and reduction of social biases, but was also shown to cause a decrease in the ability of the model to perform basic tasks. In this paper we study the tradeoff between the increase in alignment and decrease in helpfulness of the model. We propose a theoretical framework which provides bounds for these two quantities, and demonstrate their relevance empirically. Interestingly, we find that while the helpfulness generally decreases, it does so quadratically with the norm of the representation engineering vector, while the alignment increases linearly with it, indicating a regime in which it is efficient to use representation engineering. We validate our findings empirically, and chart the boundaries to the usefulness of representation engineering for alignment.
- [1647] arXiv:2401.16335 (cross-list from cs.LG) [ pdf , other ]
-
Title: Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHFSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Abstract: Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values. The initial phase of RLHF involves learning human values using a reward model from ranking data. It is observed that the performance of the reward model degrades after one epoch of training, and optimizing too much against the learned reward model eventually hinders the true objective. This paper delves into these issues, leveraging the theoretical insights to design improved reward learning algorithm termed 'Iterative Data Smoothing' (IDS). The core idea is that during each training epoch, we not only update the model with the data, but also update the date using the model, replacing hard labels with soft labels. Our empirical findings highlight the superior performance of this approach over the traditional methods.
- [1648] arXiv:2401.16350 (cross-list from cs.LG) [ pdf , other ]
-
Title: FedFair^3: Unlocking Threefold Fairness in Federated LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Federated Learning (FL) is an emerging paradigm in machine learning without exposing clients' raw data. In practical scenarios with numerous clients, encouraging fair and efficient client participation in federated learning is of utmost importance, which is also challenging given the heterogeneity in data distribution and device properties. Existing works have proposed different client-selection methods that consider fairness; however, they fail to select clients with high utilities while simultaneously achieving fair accuracy levels. In this paper, we propose a fair client-selection approach that unlocks threefold fairness in federated learning. In addition to having a fair client-selection strategy, we enforce an equitable number of rounds for client participation and ensure a fair accuracy distribution over the clients. The experimental results demonstrate that FedFair^3, in comparison to the state-of-the-art baselines, achieves 18.15% less accuracy variance on the IID data and 54.78% on the non-IID data, without decreasing the global accuracy. Furthermore, it shows 24.36% less wall-clock training time on average.
- [1649] arXiv:2401.16352 (cross-list from cs.CV) [ pdf , other ]
-
Title: Adversarial Training on Purification (AToP): Advancing Both Robustness and GeneralizationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: The deep neural networks are known to be vulnerable to well-designed adversarial attacks. The most successful defense technique based on adversarial training (AT) can achieve optimal robustness against particular attacks but cannot generalize well to unseen attacks. Another effective defense technique based on adversarial purification (AP) can enhance generalization but cannot achieve optimal robustness. Meanwhile, both methods share one common limitation on the degraded standard accuracy. To mitigate these issues, we propose a novel pipeline to acquire the robust purifier model, named Adversarial Training on Purification (AToP), which comprises two components: perturbation destruction by random transforms (RT) and purifier model fine-tuned (FT) by adversarial loss. RT is essential to avoid overlearning to known attacks, resulting in the robustness generalization to unseen attacks, and FT is essential for the improvement of robustness. To evaluate our method in an efficient and scalable way, we conduct extensive experiments on CIFAR-10, CIFAR-100, and ImageNette to demonstrate that our method achieves optimal robustness and exhibits generalization ability against unseen attacks.
- [1650] arXiv:2401.16367 (cross-list from cs.LG) [ pdf , other ]
-
Title: TQCompressor: improving tensor decomposition methods in neural networks via permutationsAuthors: V. Abronin , A. Naumov , D. Mazur , D. Bystrov , K. Tsarova , Ar. Melnikov , I. Oseledets , S. Dolgov , R. Brasher , M. PerelshteinSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We introduce TQCompressor, a novel method for neural network model compression with improved tensor decompositions. We explore the challenges posed by the computational and storage demands of pre-trained language models in NLP tasks and propose a permutation-based enhancement to Kronecker decomposition. This enhancement makes it possible to reduce loss in model expressivity which is usually associated with factorization. We demonstrate this method applied to the GPT-2$_{small}$. The result of the compression is TQCompressedGPT-2 model, featuring 81 mln. parameters compared to 124 mln. in the GPT-2$_{small}$. We make TQCompressedGPT-2 publicly available. We further enhance the performance of the TQCompressedGPT-2 through a training strategy involving multi-step knowledge distillation, using only a 3.1% of the OpenWebText. TQCompressedGPT-2 surpasses DistilGPT-2 and KnGPT-2 in comparative evaluations, marking an advancement in the efficient and effective deployment of models in resource-constrained environments.
- [1651] arXiv:2401.16402 (cross-list from cs.CV) [ pdf , other ]
-
Title: A Survey on Visual Anomaly Detection: Challenge, Approach, and ProspectAuthors: Yunkang Cao , Xiaohao Xu , Jiangning Zhang , Yuqi Cheng , Xiaonan Huang , Guansong Pang , Weiming ShenComments: Work in progress. Yunkang Cao, Xiaohao Xu, and Jiangning Zhang contribute equally to this workSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of normality in visual data, widely applied across diverse domains, e.g., industrial defect inspection, and medical lesion detection. This survey comprehensively examines recent advancements in VAD by identifying three primary challenges: 1) scarcity of training data, 2) diversity of visual modalities, and 3) complexity of hierarchical anomalies. Starting with a brief overview of the VAD background and its generic concept definitions, we progressively categorize, emphasize, and discuss the latest VAD progress from the perspective of sample number, data modality, and anomaly hierarchy. Through an in-depth analysis of the VAD field, we finally summarize future developments for VAD and conclude the key findings and contributions of this survey.
- [1652] arXiv:2401.16405 (cross-list from cs.CL) [ pdf , other ]
-
Title: Scaling Sparse Fine-Tuning to Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Large Language Models (LLMs) are difficult to fully fine-tune (e.g., with instructions or human feedback) due to their sheer number of parameters. A family of parameter-efficient sparse fine-tuning methods have proven promising in terms of performance but their memory requirements increase proportionally to the size of the LLMs. In this work, we scale sparse fine-tuning to state-of-the-art LLMs like LLaMA 2 7B and 13B. We propose SpIEL, a novel sparse fine-tuning method which, for a desired density level, maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values. It iterates over: (a) updating the active deltas, (b) pruning indices (based on the change of magnitude of their deltas) and (c) regrowth of indices. For regrowth, we explore two criteria based on either the accumulated gradients of a few candidate parameters or their approximate momenta estimated using the efficient SM3 optimizer. We experiment with instruction-tuning of LLMs on standard dataset mixtures, finding that SpIEL is often superior to popular parameter-efficient fine-tuning methods like LoRA (low-rank adaptation) in terms of performance and comparable in terms of run time. We additionally show that SpIEL is compatible with both quantization and efficient optimizers, to facilitate scaling to ever-larger model sizes. We release the code for SpIEL at this https URL and for the instruction-tuning experiments at this https URL .
- [1653] arXiv:2401.16421 (cross-list from cs.LG) [ pdf , other ]
-
Title: Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length ExtrapolationAuthors: Zhenyu He , Guhao Feng , Shengjie Luo , Kai Yang , Di He , Jingjing Xu , Zhi Zhang , Hongxia Yang , Liwei WangComments: 17 pages, 7 figures, 8 tables; Working in ProgressSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Abstract: In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE). For each position, our BiPE blends an intra-segment encoding and an inter-segment encoding. The intra-segment encoding identifies the locations within a segment and helps the model capture the semantic information therein via absolute positional encoding. The inter-segment encoding specifies the segment index, models the relationships between segments, and aims to improve extrapolation capabilities via relative positional encoding. Theoretical analysis shows this disentanglement of positional information makes learning more effective. The empirical results also show that our BiPE has superior length extrapolation capabilities across a wide range of tasks in diverse text modalities.
- [1654] arXiv:2401.16438 (cross-list from cs.LG) [ pdf , other ]
-
Title: Do deep neural networks utilize the weight space efficiently?Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Deep learning models like Transformers and Convolutional Neural Networks (CNNs) have revolutionized various domains, but their parameter-intensive nature hampers deployment in resource-constrained settings. In this paper, we introduce a novel concept utilizes column space and row space of weight matrices, which allows for a substantial reduction in model parameters without compromising performance. Leveraging this paradigm, we achieve parameter-efficient deep learning models.. Our approach applies to both Bottleneck and Attention layers, effectively halving the parameters while incurring only minor performance degradation. Extensive experiments conducted on the ImageNet dataset with ViT and ResNet50 demonstrate the effectiveness of our method, showcasing competitive performance when compared to traditional models. This approach not only addresses the pressing demand for parameter efficient deep learning solutions but also holds great promise for practical deployment in real-world scenarios.
- [1655] arXiv:2401.16440 (cross-list from cs.LG) [ pdf , other ]
-
Title: Beyond Eviction Prediction: Leveraging Local Spatiotemporal Public Records to Inform ActionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: There has been considerable recent interest in scoring properties on the basis of eviction risk. The success of methods for eviction prediction is typically evaluated using different measures of predictive accuracy. However, the underlying goal of such prediction is to direct appropriate assistance to households that may be at greater risk so they remain stably housed. Thus, we must ask the question of how useful such predictions are in targeting outreach efforts - informing action. In this paper, we investigate this question using a novel dataset that matches information on properties, evictions, and owners. We perform an eviction prediction task to produce risk scores and then use these risk scores to plan targeted outreach policies. We show that the risk scores are, in fact, useful, enabling a theoretical team of caseworkers to reach more eviction-prone properties in the same amount of time, compared to outreach policies that are either neighborhood-based or focus on buildings with a recent history of evictions. We also discuss the importance of neighborhood and ownership features in both risk prediction and targeted outreach.
- [1656] arXiv:2401.16441 (cross-list from cs.LG) [ pdf , other ]
-
Title: FaKnow: A Unified Library for Fake News DetectionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Over the past years, a large number of fake news detection algorithms based on deep learning have emerged. However, they are often developed under different frameworks, each mandating distinct utilization methodologies, consequently hindering reproducibility. Additionally, a substantial amount of redundancy characterizes the code development of such fake news detection models. To address these concerns, we propose FaKnow, a unified and comprehensive fake news detection algorithm library. It encompasses a variety of widely used fake news detection models, categorized as content-based and social context-based approaches. This library covers the full spectrum of the model training and evaluation process, effectively organizing the data, models, and training procedures within a unified framework. Furthermore, it furnishes a series of auxiliary functionalities and tools, including visualization, and logging. Our work contributes to the standardization and unification of fake news detection research, concurrently facilitating the endeavors of researchers in this field. The open-source code and documentation can be accessed at this https URL and this https URL , respectively.
- [1657] arXiv:2401.16443 (cross-list from cs.HC) [ pdf , other ]
-
Title: Evaluating Deep Networks for Detecting User Familiarity with VR from Hand InteractionsComments: AIxVR 2024 poster paperSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As VR devices become more prevalent in the consumer space, VR applications are likely to be increasingly used by users unfamiliar with VR. Detecting the familiarity level of a user with VR as an interaction medium provides the potential of providing on-demand training for acclimatization and prevents the user from being burdened by the VR environment in accomplishing their tasks. In this work, we present preliminary results of using deep classifiers to conduct automatic detection of familiarity with VR by using hand tracking of the user as they interact with a numeric passcode entry panel to unlock a VR door. We use a VR door as we envision it to the first point of entry to collaborative virtual spaces, such as meeting rooms, offices, or clinics. Users who are unfamiliar with VR will have used their hands to open doors with passcode entry panels in the real world. Thus, while the user may not be familiar with VR, they would be familiar with the task of opening the door. Using a pilot dataset consisting of 7 users familiar with VR, and 7 not familiar with VR, we acquire highest accuracy of 88.03\% when 6 test users, 3 familiar and 3 not familiar, are evaluated with classifiers trained using data from the remaining 8 users. Our results indicate potential for using user movement data to detect familiarity for the simple yet important task of secure passcode-based access.
- [1658] arXiv:2401.16444 (cross-list from cs.HC) [ pdf , other ]
-
Title: Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human GainAuthors: Yiming Gao , Feiyu Liu , Liang Wang , Zhenjie Lian , Dehua Zheng , Weixuan Wang , Wenjin Yang , Siqin Li , Xianliang Wang , Wenhui Chen , Jing Dai , Qiang Fu , Wei Yang , Lanxiao Huang , Wei LiuComments: Accepted at ICLR 2024. arXiv admin note: text overlap with arXiv:2304.11632Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Existing game AI research mainly focuses on enhancing agents' abilities to win games, but this does not inherently make humans have a better experience when collaborating with these agents. For example, agents may dominate the collaboration and exhibit unintended or detrimental behaviors, leading to poor experiences for their human partners. In other words, most game AI agents are modeled in a "self-centered" manner. In this paper, we propose a "human-centered" modeling scheme for collaborative agents that aims to enhance the experience of humans. Specifically, we model the experience of humans as the goals they expect to achieve during the task. We expect that agents should learn to enhance the extent to which humans achieve these goals while maintaining agents' original abilities (e.g., winning games). To achieve this, we propose the Reinforcement Learning from Human Gain (RLHG) approach. The RLHG approach introduces a "baseline", which corresponds to the extent to which humans primitively achieve their goals, and encourages agents to learn behaviors that can effectively enhance humans in achieving their goals better. We evaluate the RLHG agent in the popular Multi-player Online Battle Arena (MOBA) game, Honor of Kings, by conducting real-world human-agent tests. Both objective performance and subjective preference results show that the RLHG agent provides participants better gaming experience.
- [1659] arXiv:2401.16448 (cross-list from cs.AR) [ pdf , other ]
-
Title: LLM4SecHW: Leveraging Domain Specific Large Language Model for Hardware DebuggingComments: 6 pages. 1 figureJournal-ref: 2023 Asian Hardware Oriented Security and Trust Symposium (AsianHOST), Tianjin, China, 2023, pp. 1-6Subjects: Hardware Architecture (cs.AR) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose a unique approach to compile a dataset of open source hardware design defects and their remediation steps, utilizing version control data. This dataset provides a substantial foundation for training machine learning models for hardware. LLM4SecHW employs fine tuning of medium sized LLMs based on this dataset, enabling the identification and rectification of bugs in hardware designs. This pioneering approach offers a reference workflow for the application of fine tuning domain specific LLMs in other research areas. We evaluate the performance of our proposed system on various open source hardware designs, demonstrating its efficacy in accurately identifying and correcting defects. Our work brings a new perspective on automating the quality control process in hardware design.
- [1660] arXiv:2401.16450 (cross-list from cs.HC) [ pdf , other ]
-
Title: ACCESS: Prompt Engineering for Automated Web Accessibility Violation CorrectionsAuthors: Calista Huang , Alyssa Ma , Suchir Vyasamudri , Eugenie Puype , Sayem Kamal , Juan Belza Garcia , Salar Cheema , Michael LutzComments: 11 pages, 6 figuresSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Abstract: With the increasing need for inclusive and user-friendly technology, web accessibility is crucial to ensuring equal access to online content for individuals with disabilities, including visual, auditory, cognitive, or motor impairments. Despite the existence of accessibility guidelines and standards such as Web Content Accessibility Guidelines (WCAG) and the Web Accessibility Initiative (W3C), over 90% of websites still fail to meet the necessary accessibility requirements. For web users with disabilities, there exists a need for a tool to automatically fix web page accessibility errors. While research has demonstrated methods to find and target accessibility errors, no research has focused on effectively correcting such violations. This paper presents a novel approach to correcting accessibility violations on the web by modifying the document object model (DOM) in real time with foundation models. Leveraging accessibility error information, large language models (LLMs), and prompt engineering techniques, we achieved greater than a 51% reduction in accessibility violation errors after corrections on our novel benchmark: ACCESS. Our work demonstrates a valuable approach toward the direction of inclusive web content, and provides directions for future research to explore advanced methods to automate web accessibility.
- [1661] arXiv:2401.16452 (cross-list from cs.LG) [ pdf , other ]
-
Title: Context-Former: Stitching via Latent Conditioned Sequence ModelingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Offline reinforcement learning (RL) algorithms can improve the decision making via stitching sub-optimal trajectories to obtain more optimal ones. This capability is a crucial factor in enabling RL to learn policies that are superior to the behavioral policy. On the other hand, Decision Transformer (DT) abstracts the decision-making as sequence modeling, showcasing competitive performance on offline RL benchmarks, however, recent studies demonstrate that DT lacks of stitching capability, thus exploit stitching capability for DT is vital to further improve its performance. In order to endow stitching capability to DT, we abstract trajectory stitching as expert matching and introduce our approach, ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectory fragments by emulating the representations of a limited number of expert trajectories. To validate our claim, we conduct experiments from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks under the settings of IL, and experimental results demonstrate ContextFormer can achieve competitive performance in multi-IL settings. 2) More importantly, we conduct a comparison of ContextFormer with diverse competitive DT variants using identical training datasets. The experimental results unveiled ContextFormer's superiority, as it outperformed all other variants, showcasing its remarkable performance.
- [1662] arXiv:2401.16453 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Hybrid Transformer and Spatial-Temporal Self-Supervised Learning for Long-term Traffic PredictionComments: 22 pages, 10 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Long-term traffic prediction has always been a challenging task due to its dynamic temporal dependencies and complex spatial dependencies. In this paper, we propose a model that combines hybrid Transformer and spatio-temporal self-supervised learning. The model enhances its robustness by applying adaptive data augmentation techniques at the sequence-level and graph-level of the traffic data. It utilizes Transformer to overcome the limitations of recurrent neural networks in capturing long-term sequences, and employs Chebyshev polynomial graph convolution to capture complex spatial dependencies. Furthermore, considering the impact of spatio-temporal heterogeneity on traffic speed, we design two self-supervised learning tasks to model the temporal and spatial heterogeneity, thereby improving the accuracy and generalization ability of the model. Experimental evaluations are conducted on two real-world datasets, PeMS04 and PeMS08, and the results are visualized and analyzed, demonstrating the superior performance of the proposed model.
- [1663] arXiv:2401.16454 (cross-list from cs.HC) [ pdf , other ]
-
Title: KAUCUS: Knowledge Augmented User Simulators for Training Language Model AssistantsAuthors: Kaustubh D. DholeComments: Simulation of Conversational Intelligence in Chat, EACL 2024Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Abstract: An effective multi-turn instruction-following assistant can be developed by creating a simulator that can generate useful interaction data. Apart from relying on its intrinsic weights, an ideal user simulator should also be able to bootstrap external knowledge rapidly in its raw form to simulate the multifarious diversity of text available over the internet. Previous user simulators generally lacked diversity, were mostly closed domain, and necessitated rigid schema making them inefficient to rapidly scale to incorporate external knowledge. In this regard, we introduce, Kaucus, a Knowledge-Augmented User Simulator framework, to outline a process of creating diverse user simulators, that can seamlessly exploit external knowledge as well as benefit downstream assistant model training. Through two GPT-J based simulators viz., a Retrieval Augmented Simulator and a Summary Controlled Simulator we generate diverse simulator-assistant interactions. Through reward and preference model-based evaluations, we find that these interactions serve as useful training data and create more helpful downstream assistants. We also find that incorporating knowledge through retrieval augmentation or summary control helps create better assistants.
- [1664] arXiv:2401.16457 (cross-list from cs.LG) [ pdf , other ]
-
Title: Effective Controllable Bias Mitigation for Classification and Retrieval using Gate AdaptersComments: Paper is accepted to main proceedings of EACL 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Bias mitigation of Language Models has been the topic of many studies with a recent focus on learning separate modules like adapters for on-demand debiasing. Besides optimizing for a modularized debiased model, it is often critical in practice to control the degree of bias reduction at inference time, e.g., in order to tune for a desired performance-fairness trade-off in search results or to control the strength of debiasing in classification tasks. In this paper, we introduce Controllable Gate Adapter (ConGater), a novel modular gating mechanism with adjustable sensitivity parameters, which allows for a gradual transition from the biased state of the model to the fully debiased version at inference time. We demonstrate ConGater performance by (1) conducting adversarial debiasing experiments with three different models on three classification tasks with four protected attributes, and (2) reducing the bias of search results through fairness list-wise regularization to enable adjusting a trade-off between performance and fairness metrics. Our experiments on the classification tasks show that compared to baselines of the same caliber, ConGater can maintain higher task performance while containing less information regarding the attributes. Our results on the retrieval task show that the fully debiased ConGater can achieve the same fairness performance while maintaining more than twice as high task performance than recent strong baselines. Overall, besides strong performance ConGater enables the continuous transitioning between biased and debiased states of models, enhancing personalization of use and interpretability through controllability.
- [1665] arXiv:2401.16461 (cross-list from cs.MA) [ pdf , other ]
-
Title: Norm Enforcement with a Soft Touch: Faster Emergence, Happier AgentsComments: 12 pages, 11 figures, 5 tables (and supplementary material with code availability and additional results), accepted at AAMAS 2024Subjects: Multiagent Systems (cs.MA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: A multiagent system is a society of autonomous agents whose interactions can be regulated via social norms. In general, the norms of a society are not hardcoded but emerge from the agents' interactions. Specifically, how the agents in a society react to each other's behavior and respond to the reactions of others determines which norms emerge in the society. We think of these reactions by an agent to the satisfactory or unsatisfactory behaviors of another agent as communications from the first agent to the second agent. Understanding these communications is a kind of social intelligence: these communications provide natural drivers for norm emergence by pushing agents toward certain behaviors, which can become established as norms. Whereas it is well-known that sanctioning can lead to the emergence of norms, we posit that a broader kind of social intelligence can prove more effective in promoting cooperation in a multiagent system.
Accordingly, we develop Nest, a framework that models social intelligence via a wider variety of communications and understanding of them than in previous work. To evaluate Nest, we develop a simulated pandemic environment and conduct simulation experiments to compare Nest with baselines considering a combination of three kinds of social communication: sanction, tell, and hint.
We find that societies formed of Nest agents achieve norms faster. Moreover, Nest agents effectively avoid undesirable consequences, which are negative sanctions and deviation from goals, and yield higher satisfaction for themselves than baseline agents despite requiring only an equivalent amount of information. - [1666] arXiv:2401.16462 (cross-list from cs.LG) [ pdf , other ]
-
Title: Supervised Contrastive Learning based Dual-Mixer Model for Remaining Useful Life PredictionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The problem of the Remaining Useful Life (RUL) prediction, aiming at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device, has gained significant attention from researchers in recent years. In this paper, to overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approaches, a spatial-temporal homogeneous feature extractor, named Dual-Mixer model, is firstly proposed. Flexible layer-wise progressive feature fusion is employed to ensure the homogeneity of spatial-temporal features and enhance the prediction accuracy. Secondly, the Feature Space Global Relationship Invariance (FSGRI) training method is introduced based on supervised contrastive learning. This method maintains the consistency of relationships among sample features with their degradation patterns during model training, simplifying the subsequently regression task in the output layer and improving the model's performance in RUL prediction. Finally, the effectiveness of the proposed method is validated through comparisons with other latest research works on the C-MAPSS dataset. The Dual-Mixer model demonstrates superiority across most metrics, while the FSGRI training method shows an average improvement of 7.00% and 2.41% in RMSE and MAPE, respectively, for all baseline models. Our experiments and model code are publicly available at this https URL .
- [1667] arXiv:2401.16467 (cross-list from cs.SE) [ pdf , other ]
-
Title: ReGAL: Refactoring Programs to Discover Generalizable AbstractionsComments: 18 pages; First two authors contributed equally; Code: this https URLSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Programming Languages (cs.PL)
Abstract: While large language models (LLMs) are increasingly being used for program synthesis, they lack the global view needed to develop useful abstractions; they generally predict programs one at a time, often repeating the same functionality. Generating redundant code from scratch is both inefficient and error-prone. To address this, we propose Refactoring for Generalizable Abstraction Learning (ReGAL), a gradient-free method for learning a library of reusable functions via code refactorization, i.e. restructuring code without changing its execution output. ReGAL learns from a small set of existing programs, iteratively verifying and refining its abstractions via execution. We find that the shared function libraries discovered by ReGAL make programs easier to predict across diverse domains. On three datasets (LOGO graphics generation, Date reasoning, and TextCraft, a Minecraft-based text game), both open-source and proprietary LLMs improve in accuracy when predicting programs with ReGAL functions. For CodeLlama-13B, ReGAL results in absolute accuracy increases of 11.5% on graphics, 26.1% on date understanding, and 8.1% on TextCraft, outperforming GPT-3.5 in two of three domains. Our analysis reveals ReGAL's abstractions encapsulate frequently-used subroutines as well as environment dynamics.
- [1668] arXiv:2401.16501 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: AFSD-Physics: Exploring the governing equations of temperature evolution during additive friction stir deposition by a human-AI teaming approachSubjects: Machine Learning (cs.LG) ; Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI)
Abstract: This paper presents a modeling effort to explore the underlying physics of temperature evolution during additive friction stir deposition (AFSD) by a human-AI teaming approach. AFSD is an emerging solid-state additive manufacturing technology that deposits materials without melting. However, both process modeling and modeling of the AFSD tool are at an early stage. In this paper, a human-AI teaming approach is proposed to combine models based on first principles with AI. The resulting human-informed machine learning method, denoted as AFSD-Physics, can effectively learn the governing equations of temperature evolution at the tool and the build from in-process measurements. Experiments are designed and conducted to collect in-process measurements for the deposition of aluminum 7075 with a total of 30 layers. The acquired governing equations are physically interpretable models with low computational cost and high accuracy. Model predictions show good agreement with the measurements. Experimental validation with new process parameters demonstrates the model's generalizability and potential for use in tool temperature control and process optimization.
- [1669] arXiv:2401.16521 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Validation, Robustness, and Accuracy of Perturbation-Based Sensitivity Analysis Methods for Time-Series Deep Learning ModelsAuthors: Zhengguang WangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This work undertakes studies to evaluate Interpretability Methods for Time-Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work answers three research questions: 1) Do different sensitivity analysis (SA) methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning (DL) models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth?
- [1670] arXiv:2401.16541 (cross-list from cs.CL) [ pdf , other ]
-
Title: GuReT: Distinguishing Guilt and Regret related TextAuthors: Sabur Butt , Fazlourrahman Balouchzahi , Abdul Gafar Manuel Meque , Maaz Amjad , Hector G. Ceballos Cancino , Grigori Sidorov , Alexander GelbukhSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The intricate relationship between human decision-making and emotions, particularly guilt and regret, has significant implications on behavior and well-being. Yet, these emotions subtle distinctions and interplay are often overlooked in computational models. This paper introduces a dataset tailored to dissect the relationship between guilt and regret and their unique textual markers, filling a notable gap in affective computing research. Our approach treats guilt and regret recognition as a binary classification task and employs three machine learning and six transformer-based deep learning techniques to benchmark the newly created dataset. The study further implements innovative reasoning methods like chain-of-thought and tree-of-thought to assess the models interpretive logic. The results indicate a clear performance edge for transformer-based models, achieving a 90.4% macro F1 score compared to the 85.3% scored by the best machine learning classifier, demonstrating their superior capability in distinguishing complex emotional states.
- [1671] arXiv:2401.16553 (cross-list from cs.CL) [ pdf , other ]
-
Title: SelectLLM: Can LLMs Select Important Instructions to Annotate?Comments: First Authors: Ritik Sachin Parkar and Jaehyung Kim | Second Author: Jong Inn Park | PI: Dongyeop KangSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Instruction tuning benefits from large and diverse datasets, however creating such datasets involves a high cost of human labeling. While synthetic datasets generated by large language models (LLMs) have partly solved this issue, they often contain low-quality data. One effective solution is selectively annotating unlabelled instructions, especially given the relative ease of acquiring unlabeled instructions or texts from various sources. However, how to select unlabelled instructions is not well-explored, especially in the context of LLMs. Further, traditional data selection methods, relying on input embedding space density, tend to underestimate instruction sample complexity, whereas those based on model prediction uncertainty often struggle with synthetic label quality. Therefore, we introduce SelectLLM, an alternative framework that leverages the capabilities of LLMs to more effectively select unlabeled instructions. SelectLLM consists of two key steps: Coreset-based clustering of unlabelled instructions for diversity and then prompting a LLM to identify the most beneficial instructions within each cluster. Our experiments demonstrate that SelectLLM matches or outperforms other state-of-the-art methods in instruction tuning benchmarks. It exhibits remarkable consistency across human and synthetic datasets, along with better cross-dataset generalization, as evidenced by a 10% performance improvement on the Cleaned Alpaca test set when trained on Dolly data. All code and data are publicly available ( this https URL ).
- [1672] arXiv:2401.16561 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Multi-class Regret Detection in Hindi Devanagari ScriptSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: The number of Hindi speakers on social media has increased dramatically in recent years. Regret is a common emotional experience in our everyday life. Many speakers on social media, share their regretful experiences and opinions regularly. It might cause a re-evaluation of one's choices and a desire to make a different option if given the chance. As a result, knowing the source of regret is critical for investigating its impact on behavior and decision-making. This study focuses on regret and how it is expressed, specifically in Hindi, on various social media platforms. In our study, we present a novel dataset from three different sources, where each sentence has been manually classified into one of three classes "Regret by action", "Regret by inaction", and "No regret". Next, we use this dataset to investigate the linguistic expressions of regret in Hindi text and also identify the textual domains that are most frequently associated with regret. Our findings indicate that individuals on social media platforms frequently express regret for both past inactions and actions, particularly within the domain of interpersonal relationships. We use a pre-trained BERT model to generate word embeddings for the Hindi dataset and also compare deep learning models with conventional machine learning models in order to demonstrate accuracy. Our results show that BERT embedding with CNN consistently surpassed other models. This described the effectiveness of BERT for conveying the context and meaning of words in the regret domain.
- [1673] arXiv:2401.16569 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Autoencoder-Based Domain Learning for Semantic Communication with Conceptual SpacesComments: 6 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Image and Video Processing (eess.IV)
Abstract: Communication with the goal of accurately conveying meaning, rather than accurately transmitting symbols, has become an area of growing interest. This paradigm, termed semantic communication, typically leverages modern developments in artificial intelligence and machine learning to improve the efficiency and robustness of communication systems. However, a standard model for capturing and quantifying the details of "meaning" is lacking, with many leading approaches to semantic communication adopting a black-box framework with little understanding of what exactly the model is learning. One solution is to utilize the conceptual spaces framework, which models meaning explicitly in a geometric manner. Though prior work studying semantic communication with conceptual spaces has shown promising results, these previous attempts involve hand-crafting a conceptual space model, severely limiting the scalability and practicality of the approach. In this work, we develop a framework for learning a domain of a conceptual space model using only the raw data with high-level property labels. In experiments using the MNIST and CelebA datasets, we show that the domains learned using the framework maintain semantic similarity relations and possess interpretable dimensions.
- [1674] arXiv:2401.16577 (cross-list from cs.CL) [ pdf , other ]
-
Title: LLMs as On-demand Customizable ServiceAuthors: Souvika Sarkar , Mohammad Fakhruddin Babar , Monowar Hasan , Shubhra Kanti Karmaker (Santu)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have demonstrated remarkable language understanding and generation capabilities. However, training, deploying, and accessing these models pose notable challenges, including resource-intensive demands, extended training durations, and scalability issues. To address these issues, we introduce a concept of hierarchical, distributed LLM architecture that aims at enhancing the accessibility and deployability of LLMs across heterogeneous computing platforms, including general-purpose computers (e.g., laptops) and IoT-style devices (e.g., embedded systems). By introducing a "layered" approach, the proposed architecture enables on-demand accessibility to LLMs as a customizable service. This approach also ensures optimal trade-offs between the available computational resources and the user's application needs. We envision that the concept of hierarchical LLM will empower extensive, crowd-sourced user bases to harness the capabilities of LLMs, thereby fostering advancements in AI technology in general.
- [1675] arXiv:2401.16578 (cross-list from cs.CL) [ pdf , other ]
-
Title: Leveraging Professional Radiologists' Expertise to Enhance LLMs' Evaluation for Radiology ReportsAuthors: Qingqing Zhu , Xiuying Chen , Qiao Jin , Benjamin Hou , Tejas Sudharshan Mathai , Pritam Mukherjee , Xin Gao , Ronald M Summers , Zhiyong LuSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: In radiology, Artificial Intelligence (AI) has significantly advanced report generation, but automatic evaluation of these AI-produced reports remains challenging. Current metrics, such as Conventional Natural Language Generation (NLG) and Clinical Efficacy (CE), often fall short in capturing the semantic intricacies of clinical contexts or overemphasize clinical details, undermining report clarity. To overcome these issues, our proposed method synergizes the expertise of professional radiologists with Large Language Models (LLMs), like GPT-3.5 and GPT-4 1. Utilizing In-Context Instruction Learning (ICIL) and Chain of Thought (CoT) reasoning, our approach aligns LLM evaluations with radiologist standards, enabling detailed comparisons between human and AI generated reports. This is further enhanced by a Regression model that aggregates sentence evaluation scores. Experimental results show that our "Detailed GPT-4 (5-shot)" model achieves a 0.48 score, outperforming the METEOR metric by 0.19, while our "Regressed GPT-4" model shows even greater alignment with expert evaluations, exceeding the best existing metric by a 0.35 margin. Moreover, the robustness of our explanations has been validated through a thorough iterative strategy. We plan to publicly release annotations from radiology experts, setting a new standard for accuracy in future assessments. This underscores the potential of our approach in enhancing the quality assessment of AI-driven medical reports.
- [1676] arXiv:2401.16587 (cross-list from cs.CL) [ pdf , other ]
-
Title: A Linguistic Comparison between Human and ChatGPT-Generated ConversationsComments: Preprint. Pending review and feedback from ICPRAI2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone, reinforcing recent findings of LLMs being "more human than human." However, no significant difference was found in positive or negative affect between ChatGPT and human dialogues. Classifier analysis of dialogue embeddings indicates implicit coding of the valence of affect despite no explicit mention of affect in the conversations. The research also contributes a novel, companion ChatGPT-generated dataset of conversations between two independent chatbots, which were designed to replicate a corpus of human conversations available for open access and used widely in AI research on language modeling. Our findings increase understanding of ChatGPT's linguistic capabilities and inform ongoing efforts to distinguish between human and LLM-generated text, which is critical in detecting AI-generated fakes, misinformation, and disinformation.
- [1677] arXiv:2401.16618 (cross-list from cs.RO) [ pdf , other ]
-
Title: A comparison of RL-based and PID controllers for 6-DOF swimming robots: hybrid underwater object trackingSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: In this paper, we present an exploration and assessment of employing a centralized deep Q-network (DQN) controller as a substitute for the prevalent use of PID controllers in the context of 6DOF swimming robots. Our primary focus centers on illustrating this transition with the specific case of underwater object tracking. DQN offers advantages such as data efficiency and off-policy learning, while remaining simpler to implement than other reinforcement learning methods. Given the absence of a dynamic model for our robot, we propose an RL agent to control this multi-input-multi-output (MIMO) system, where a centralized controller may offer more robust control than distinct PIDs. Our approach involves initially using classical controllers for safe exploration, then gradually shifting to DQN to take full control of the robot.
We divide the underwater tracking task into vision and control modules. We use established methods for vision-based tracking and introduce a centralized DQN controller. By transmitting bounding box data from the vision module to the control module, we enable adaptation to various objects and effortless vision system replacement. Furthermore, dealing with low-dimensional data facilitates cost-effective online learning for the controller. Our experiments, conducted within a Unity-based simulator, validate the effectiveness of a centralized RL agent over separated PID controllers, showcasing the applicability of our framework for training the underwater RL agent and improved performance compared to traditional control methods. The code for both real and simulation implementations is at this https URL . - [1678] arXiv:2401.16633 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: I came, I saw, I certified: some perspectives on the safety assurance of cyber-physical systemsAuthors: Mithila Sivakumar , Alvine B. Belle , Kimya Khakzad Shahandashti , Oluwafemi Odu , Hadi Hemmati , Segla Kpodjedo , Song Wang , Opeyemi O. AdesinaSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: The execution failure of cyber-physical systems (e.g., autonomous driving systems, unmanned aerial systems, and robotic systems) could result in the loss of life, severe injuries, large-scale environmental damage, property destruction, and major economic loss. Hence, such systems usually require a strong justification that they will effectively support critical requirements (e.g., safety, security, and reliability) for which they were designed. Thus, it is often mandatory to develop compelling assurance cases to support that justification and allow regulatory bodies to certify such systems. In such contexts, detecting assurance deficits, relying on patterns to improve the structure of assurance cases, improving existing assurance case notations, and (semi-)automating the generation of assurance cases are key to develop compelling assurance cases and foster consumer acceptance. We therefore explore challenges related to such assurance enablers and outline some potential directions that could be explored to tackle them.
- [1679] arXiv:2401.16635 (cross-list from cs.LG) [ pdf , other ]
-
Title: Improving Reinforcement Learning from Human Feedback with Efficient Reward Model EnsembleSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Reinforcement Learning from Human Feedback (RLHF) is a widely adopted approach for aligning large language models with human values. However, RLHF relies on a reward model that is trained with a limited amount of human preference data, which could lead to inaccurate predictions. As a result, RLHF may produce outputs that are misaligned with human values. To mitigate this issue, we contribute a reward ensemble method that allows the reward model to make more accurate predictions. As using an ensemble of large language model-based reward models can be computationally and resource-expensive, we explore efficient ensemble methods including linear-layer ensemble and LoRA-based ensemble. Empirically, we run Best-of-$n$ and Proximal Policy Optimization with our ensembled reward models, and verify that our ensemble methods help improve the alignment performance of RLHF outputs.
- [1680] arXiv:2401.16638 (cross-list from cs.CL) [ pdf , other ]
-
Title: Breaking Free Transformer Models: Task-specific Context Attribution Promises Improved Generalizability Without Fine-tuning Pre-trained LLMsComments: 8 pages, 3 figures, 5 tables, To be published in 2024 AAAI workshop on Responsible Language Models (ReLM)Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Fine-tuning large pre-trained language models (LLMs) on particular datasets is a commonly employed strategy in Natural Language Processing (NLP) classification tasks. However, this approach usually results in a loss of models generalizability. In this paper, we present a framework that allows for maintaining generalizability, and enhances the performance on the downstream task by utilizing task-specific context attribution. We show that a linear transformation of the text representation from any transformer model using the task-specific concept operator results in a projection onto the latent concept space, referred to as context attribution in this paper. The specific concept operator is optimized during the supervised learning stage via novel loss functions. The proposed framework demonstrates that context attribution of the text representation for each task objective can improve the capacity of the discriminator function and thus achieve better performance for the classification task. Experimental results on three datasets, namely HateXplain, IMDB reviews, and Social Media Attributions, illustrate that the proposed model attains superior accuracy and generalizability. Specifically, for the non-fine-tuned BERT on the HateXplain dataset, we observe 8% improvement in accuracy and 10% improvement in F1-score. Whereas for the IMDB dataset, fine-tuned state-of-the-art XLNet is outperformed by 1% for both accuracy and F1-score. Furthermore, in an out-of-domain cross-dataset test, DistilBERT fine-tuned on the IMDB dataset in conjunction with the proposed model improves the F1-score on the HateXplain dataset by 7%. For the Social Media Attributions dataset of YouTube comments, we observe 5.2% increase in F1-metric. The proposed framework is implemented with PyTorch and provided open-source on GitHub.
- [1681] arXiv:2401.16646 (cross-list from cs.CL) [ pdf , other ]
-
Title: Incoherent Probability Judgments in Large Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Autoregressive Large Language Models (LLMs) trained for next-word prediction have demonstrated remarkable proficiency at producing coherent text. But are they equally adept at forming coherent probability judgments? We use probabilistic identities and repeated judgments to assess the coherence of probability judgments made by LLMs. Our results show that the judgments produced by these models are often incoherent, displaying human-like systematic deviations from the rules of probability theory. Moreover, when prompted to judge the same event, the mean-variance relationship of probability judgments produced by LLMs shows an inverted-U-shaped like that seen in humans. We propose that these deviations from rationality can be explained by linking autoregressive LLMs to implicit Bayesian inference and drawing parallels with the Bayesian Sampler model of human probability judgments.
- [1682] arXiv:2401.16650 (cross-list from cs.LG) [ pdf , other ]
-
Title: Augmenting Replay in World Models for Continual Reinforcement LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Continual RL is a challenging problem where the agent is exposed to a sequence of tasks; it should learn new tasks without forgetting old ones, and learning the new task should improve performance on previous and future tasks. The most common approaches use model-free RL algorithms as a base, and replay buffers have been used to overcome catastrophic forgetting. However, the buffers are often very large making scalability difficult. Also, the concept of replay comes from biological inspiration, where evidence suggests that replay is applied to a world model, which implies model-based RL -- and model-based RL should have benefits for continual RL, where it is possible to exploit knowledge independent of the policy. We present WMAR, World Models with Augmented Replay, a model-based RL algorithm with a world model and memory efficient distribution matching replay buffer. It is based on the well-known DreamerV3 algorithm, which has a simple FIFO buffer and was not tested in a continual RL setting. We evaluated WMAR vs WMAR (FIFO only) on tasks with and without shared structure from OpenAI ProcGen and Atari respectively, and without a task oracle. We found that WMAR has favourable properties on continual RL with significantly reduced computational overhead compared to WMAR (FIFO only). WMAR had small benefits over DreamerV3 on tasks with shared structure and substantially better forgetting characteristics on tasks without shared structure; but at the cost of lower plasticity seen in a lower maximum on new tasks. The results suggest that model-based RL using a world model with a memory efficient replay buffer can be an effective and practical approach to continual RL, justifying future work.
- [1683] arXiv:2401.16669 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Improving Global Weather and Ocean Wave Forecast with Large Artificial Intelligence ModelsAuthors: Fenghua Ling , Lin Ouyang , Boufeniza Redouane Larbi , Jing-Jia Luo , Tao Han , Xiaohui Zhong , Lei BaiSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Atmospheric and Oceanic Physics (physics.ao-ph); Geophysics (physics.geo-ph)
Abstract: The rapid advancement of artificial intelligence technologies, particularly in recent years, has led to the emergence of several large parameter artificial intelligence weather forecast models. These models represent a significant breakthrough, overcoming the limitations of traditional numerical weather prediction models and indicating the emergence of profound potential tools for atmosphere-ocean forecasts. This study explores the evolution of these advanced artificial intelligence forecast models, and based on the identified commonalities, proposes the "Three Large Rules" to measure their development. We discuss the potential of artificial intelligence in revolutionizing numerical weather prediction, and briefly outlining the underlying reasons for its great potential. While acknowledging the high accuracy, computational efficiency, and ease of deployment of large artificial intelligence forecast models, we also emphasize the irreplaceable values of traditional numerical forecasts and explore the challenges in the future development of large-scale artificial intelligence atmosphere-ocean forecast models. We believe that the optimal future of atmosphere-ocean weather forecast lies in achieving a seamless integration of artificial intelligence and traditional numerical models. Such a synthesis is anticipated to offer a more advanced and reliable approach for improved atmosphere-ocean forecasts. Additionally, we illustrate how forecasters can adapt and leverage the advanced artificial intelligence model through an example by building a large artificial intelligence model for global ocean wave forecast.
- [1684] arXiv:2401.16672 (cross-list from cs.IR) [ pdf , ps , other ]
-
Title: AutoIE: An Automated Framework for Information Extraction from Scientific LiteratureSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE)
Abstract: In the rapidly evolving field of scientific research, efficiently extracting key information from the burgeoning volume of scientific papers remains a formidable challenge. This paper introduces an innovative framework designed to automate the extraction of vital data from scientific PDF documents, enabling researchers to discern future research trajectories more readily. AutoIE uniquely integrates four novel components: (1) A multi-semantic feature fusion-based approach for PDF document layout analysis; (2) Advanced functional block recognition in scientific texts; (3) A synergistic technique for extracting and correlating information on molecular sieve synthesis; (4) An online learning paradigm tailored for molecular sieve literature. Our SBERT model achieves high Marco F1 scores of 87.19 and 89.65 on CoNLL04 and ADE datasets. In addition, a practical application of AutoIE in the petrochemical molecular sieve synthesis domain demonstrates its efficacy, evidenced by an impressive 78\% accuracy rate. This research paves the way for enhanced data management and interpretation in molecular sieve synthesis. It is a valuable asset for seasoned experts and newcomers in this specialized field.
- [1685] arXiv:2401.16708 (cross-list from cs.LG) [ pdf , other ]
-
Title: Multivariate Beta Mixture Model: Probabilistic Clustering With Flexible Cluster ShapesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces the multivariate beta mixture model (MBMM), a new probabilistic model for soft clustering. MBMM adapts to diverse cluster shapes because of the flexible probability density function of the multivariate beta distribution. We introduce the properties of MBMM, describe the parameter learning procedure, and present the experimental results, showing that MBMM fits diverse cluster shapes on synthetic and real datasets. The code is released anonymously at this https URL .
- [1686] arXiv:2401.16731 (cross-list from cs.CL) [ pdf , other ]
-
Title: Towards Generating Informative Textual Description for Neurons in Language ModelsComments: AAAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Recent developments in transformer-based language models have allowed them to capture a wide variety of world knowledge that can be adapted to downstream tasks with limited resources. However, what pieces of information are understood in these models is unclear, and neuron-level contributions in identifying them are largely unknown. Conventional approaches in neuron explainability either depend on a finite set of pre-defined descriptors or require manual annotations for training a secondary model that can then explain the neurons of the primary model. In this paper, we take BERT as an example and we try to remove these constraints and propose a novel and scalable framework that ties textual descriptions to neurons. We leverage the potential of generative language models to discover human-interpretable descriptors present in a dataset and use an unsupervised approach to explain neurons with these descriptors. Through various qualitative and quantitative analyses, we demonstrate the effectiveness of this framework in generating useful data-specific descriptors with little human involvement in identifying the neurons that encode these descriptors. In particular, our experiment shows that the proposed approach achieves 75% precision@2, and 50% recall@2
- [1687] arXiv:2401.16742 (cross-list from cs.HC) [ pdf , other ]
-
Title: Generative AI-based closed-loop fMRI systemAuthors: Mikihiro Kasahara , Taiki Oka , Vincent Taschereau-Dumouchel , Mitsuo Kawato , Hiroki Takakura , Aurelio CorteseSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Abstract: While generative AI is now widespread and useful in society, there are potential risks of misuse, e.g., unconsciously influencing cognitive processes or decision-making. Although this causes a security problem in the cognitive domain, there has been no research about neural and computational mechanisms counteracting the impact of malicious generative AI in humans. We propose DecNefGAN, a novel framework that combines a generative adversarial system and a neural reinforcement model. More specifically, DecNefGAN bridges human and generative AI in a closed-loop system, with the AI creating stimuli that induce specific mental states, thus exerting external control over neural activity. The objective of the human is the opposite, to compete and reach an orthogonal mental state. This framework can contribute to elucidating how the human brain responds to and counteracts the potential influence of generative AI.
- [1688] arXiv:2401.16755 (cross-list from cs.LG) [ pdf , other ]
-
Title: Diffusion model for relational inferenceSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Dynamical behaviors of complex interacting systems, including brain activities, financial price movements, and physical collective phenomena, are associated with underlying interactions between the system's components. The issue of uncovering interaction relations in such systems using observable dynamics is called relational inference. In this study, we propose a Diffusion model for Relational Inference (DiffRI), inspired by a self-supervised method for probabilistic time series imputation. DiffRI learns to infer the probability of the presence of connections between components through conditional diffusion modeling. Experiments on both simulated and quasi-real datasets show that DiffRI is highly competent compared with other state-of-the-art models in discovering ground truth interactions in an unsupervised manner. Our code will be made public soon.
- [1689] arXiv:2401.16757 (cross-list from cs.LG) [ pdf , other ]
-
Title: SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory BudgetComments: 14 pages, 19 figures, accepted by IEEE Transactions on Mobile ComputingSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Abstract: Executing deep neural networks (DNNs) on edge artificial intelligence (AI) devices enables various autonomous mobile computing applications. However, the memory budget of edge AI devices restricts the number and complexity of DNNs allowed in such applications. Existing solutions, such as model compression or cloud offloading, reduce the memory footprint of DNN inference at the cost of decreased model accuracy or autonomy. To avoid these drawbacks, we divide DNN into blocks and swap them in and out in order, such that large DNNs can execute within a small memory budget. Nevertheless, naive swapping on edge AI devices induces significant delays due to the redundant memory operations in the DNN development ecosystem for edge AI devices. To this end, we develop SwapNet, an efficient DNN block swapping middleware for edge AI devices. We systematically eliminate the unnecessary memory operations during block swapping while retaining compatible with the deep learning frameworks, GPU backends, and hardware architectures of edge AI devices. We further showcase the utility of SwapNet via a multi-DNN scheduling scheme. Evaluations on eleven DNN inference tasks in three applications demonstrate that SwapNet achieves almost the same latency as the case with sufficient memory even when DNNs demand 2.32x to 5.81x memory beyond the available budget. The design of SwapNet also provides novel and feasible insights for deploying large language models (LLMs) on edge AI devices in the future.
- [1690] arXiv:2401.16765 (cross-list from cs.CR) [ pdf , other ]
-
Title: A Cross-Language Investigation into Jailbreak Attacks in Large Language ModelsAuthors: Jie Li , Yi Liu , Chongyang Liu , Ling Shi , Xiaoning Ren , Yaowen Zheng , Yang Liu , Yinxing XueSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs) have become increasingly popular for their advanced text generation capabilities across various domains. However, like any software, they face security challenges, including the risk of 'jailbreak' attacks that manipulate LLMs to produce prohibited content. A particularly underexplored area is the Multilingual Jailbreak attack, where malicious questions are translated into various languages to evade safety filters. Currently, there is a lack of comprehensive empirical studies addressing this specific threat.
To address this research gap, we conducted an extensive empirical study on Multilingual Jailbreak attacks. We developed a novel semantic-preserving algorithm to create a multilingual jailbreak dataset and conducted an exhaustive evaluation on both widely-used open-source and commercial LLMs, including GPT-4 and LLaMa. Additionally, we performed interpretability analysis to uncover patterns in Multilingual Jailbreak attacks and implemented a fine-tuning mitigation method. Our findings reveal that our mitigation strategy significantly enhances model defense, reducing the attack success rate by 96.2%. This study provides valuable insights into understanding and mitigating Multilingual Jailbreak attacks. - [1691] arXiv:2401.16766 (cross-list from cs.LG) [ pdf , other ]
-
Title: Detection and Recovery Against Deep Neural Network Fault Injection Attacks Based on Contrastive LearningComments: Published in AdvML 2021Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Deep Neural Network (DNN) models when implemented on executing devices as the inference engines are susceptible to Fault Injection Attacks (FIAs) that manipulate model parameters to disrupt inference execution with disastrous performance. This work introduces Contrastive Learning (CL) of visual representations i.e., a self-supervised learning approach into the deep learning training and inference pipeline to implement DNN inference engines with self-resilience under FIAs. Our proposed CL based FIA Detection and Recovery (CFDR) framework features (i) real-time detection with only a single batch of testing data and (ii) fast recovery effective even with only a small amount of unlabeled testing data. Evaluated with the CIFAR-10 dataset on multiple types of FIAs, our CFDR shows promising detection and recovery effectiveness.
- [1692] arXiv:2401.16772 (cross-list from cs.LG) [ pdf , other ]
-
Title: Extrinsicaly Rewarded Soft Q Imitation Learning with DiscriminatorComments: 9 pages, 4 figures. arXiv admin note: text overlap with arXiv:2001.06808Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Imitation learning is often used in addition to reinforcement learning in environments where reward design is difficult or where the reward is sparse, but it is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and Generative Adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning (SQIL) addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, we propose more efficient and robust algorithm by adding to this method a reward function based on adversarial inverse reinforcement learning that rewards the agent for performing actions in status similar to the demo. We call this algorithm Discriminator Soft Q Imitation Learning (DSQIL). We evaluated it on MuJoCo environments.
- [1693] arXiv:2401.16784 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Fairness Learning under Distribution ShiftsComments: Accepted by WWW 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: Graph neural networks (GNNs) have achieved remarkable performance on graph-structured data. However, GNNs may inherit prejudice from the training data and make discriminatory predictions based on sensitive attributes, such as gender and race. Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are under the same distribution, i.e., training data and testing data are from the same graph. Will graph fairness performance decrease under distribution shifts? How does distribution shifts affect graph fairness learning? All these open questions are largely unexplored from a theoretical perspective. To answer these questions, we first theoretically identify the factors that determine bias on a graph. Subsequently, we explore the factors influencing fairness on testing graphs, with a noteworthy factor being the representation distances of certain groups between the training and testing graph. Motivated by our theoretical analysis, we propose our framework FatraGNN. Specifically, to guarantee fairness performance on unknown testing graphs, we propose a graph generator to produce numerous graphs with significant bias and under different distributions. Then we minimize the representation distances for each certain group between the training graph and generated graphs. This empowers our model to achieve high classification and fairness performance even on generated graphs with significant bias, thereby effectively handling unknown testing graphs. Experiments on real-world and semi-synthetic datasets demonstrate the effectiveness of our model in terms of both accuracy and fairness.
- [1694] arXiv:2401.16788 (cross-list from cs.CL) [ pdf , other ]
-
Title: Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent DebateSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Despite the utility of Large Language Models (LLMs) across a wide range of tasks and scenarios, developing a method for reliably evaluating LLMs across varied contexts continues to be challenging. Modern evaluation approaches often use LLMs to assess responses generated by LLMs. However, the meta-evaluation conducted to assess the effectiveness of these LLMs as evaluators is typically constrained by the coverage of existing benchmarks or requires extensive human annotation. This underscores the urgency of methods for scalable meta-evaluation that can effectively, reliably, and efficiently evaluate the performance of LLMs as evaluators across diverse tasks and scenarios, particularly in potentially new, user-defined scenarios. To fill this gap, we propose ScaleEval, an agent-debate-assisted meta-evaluation framework that leverages the capabilities of multiple communicative LLM agents. This framework supports multi-round discussions to assist human annotators in discerning the most capable LLMs as evaluators, which significantly eases their workload in cases that used to require large-scale annotations during meta-evaluation. We release the code for our framework, which is publicly available at: \url{ this https URL }.
- [1695] arXiv:2401.16795 (cross-list from cs.LG) [ pdf , other ]
-
Title: Performance Insights-based AI-driven Football Transfer Fee PredictionAuthors: Daniil SulimovSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: We developed an artificial intelligence approach to predict the transfer fee of a football player. This model can help clubs make better decisions about which players to buy and sell, which can lead to improved performance and increased club budgets. Having collected data on player performance, transfer fees, and other factors that might affect a player's value, we then used this data to train a machine learning model that can accurately predict a player's impact on the game. We further passed the obtained results as one of the features to the predictor of transfer fees. The model can help clubs identify players who are undervalued and who could be sold for a profit. It can also help clubs avoid overpaying for players. We believe that our model can be a valuable tool for football clubs. It can help them make better decisions about player recruitment and transfers.
- [1696] arXiv:2401.16807 (cross-list from cs.IR) [ pdf , other ]
-
Title: Detecting LLM-Assisted Writing in Scientific Communication: Are We There Yet?Subjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Large Language Models (LLMs), exemplified by ChatGPT, have significantly reshaped text generation, particularly in the realm of writing assistance. While ethical considerations underscore the importance of transparently acknowledging LLM use, especially in scientific communication, genuine acknowledgment remains infrequent. A potential avenue to encourage accurate acknowledging of LLM-assisted writing involves employing automated detectors. Our evaluation of four cutting-edge LLM-generated text detectors reveals their suboptimal performance compared to a simple ad-hoc detector designed to identify abrupt writing style changes around the time of LLM proliferation. We contend that the development of specialized detectors exclusively dedicated to LLM-assisted writing detection is necessary. Such detectors could play a crucial role in fostering more authentic recognition of LLM involvement in scientific communication, addressing the current challenges in acknowledgment practices.
- [1697] arXiv:2401.16808 (cross-list from cs.LG) [ pdf , other ]
-
Title: Encoding Temporal Statistical-space Priors via Augmented RepresentationComments: pre-printSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Modeling time series data remains a pervasive issue as the temporal dimension is inherent to numerous domains. Despite significant strides in time series forecasting, high noise-to-signal ratio, non-normality, non-stationarity, and lack of data continue challenging practitioners. In response, we leverage a simple representation augmentation technique to overcome these challenges. Our augmented representation acts as a statistical-space prior encoded at each time step. In response, we name our method Statistical-space Augmented Representation (SSAR). The underlying high-dimensional data-generating process inspires our representation augmentation. We rigorously examine the empirical generalization performance on two data sets with two downstream temporal learning algorithms. Our approach significantly beats all five up-to-date baselines. Moreover, the highly modular nature of our approach can easily be applied to various settings. Lastly, fully-fledged theoretical perspectives are available throughout the writing for a clear and rigorous understanding.
- [1698] arXiv:2401.16867 (cross-list from cs.CV) [ pdf , other ]
-
Title: A Tournament of Transformation Models: B-Spline-based vs. Mesh-based Multi-Objective Deformable Image RegistrationAuthors: Georgios Andreadis , Joas I. Mulder , Anton Bouter , Peter A. N. Bosman , Tanja AlderliestenComments: Pre-print for the SPIE Medical Imaging: Image Processing ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Abstract: The transformation model is an essential component of any deformable image registration approach. It provides a representation of physical deformations between images, thereby defining the range and realism of registrations that can be found. Two types of transformation models have emerged as popular choices: B-spline models and mesh models. Although both models have been investigated in detail, a direct comparison has not yet been made, since the models are optimized using very different optimization methods in practice. B-spline models are predominantly optimized using gradient-descent methods, while mesh models are typically optimized using finite-element method solvers or evolutionary algorithms. Multi-objective optimization methods, which aim to find a diverse set of high-quality trade-off registrations, are increasingly acknowledged to be important in deformable image registration. Since these methods search for a diverse set of registrations, they can provide a more complete picture of the capabilities of different transformation models, making them suitable for a comparison of models. In this work, we conduct the first direct comparison between B-spline and mesh transformation models, by optimizing both models with the same state-of-the-art multi-objective optimization method, the Multi-Objective Real-Valued Gene-pool Optimal Mixing Evolutionary Algorithm (MO-RV-GOMEA). The combination with B-spline transformation models, moreover, is novel. We experimentally compare both models on two different registration problems that are both based on pelvic CT scans of cervical cancer patients, featuring large deformations. Our results, on three cervical cancer patients, indicate that the choice of transformation model can have a profound impact on the diversity and quality of achieved registration outcomes.
- [1699] arXiv:2401.16889 (cross-list from cs.RO) [ pdf , other ]
-
Title: Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion ControlAuthors: Zhongyu Li , Xue Bin Peng , Pieter Abbeel , Sergey Levine , Glen Berseth , Koushil SreenathSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: This paper presents a comprehensive study on using deep reinforcement learning (RL) to create dynamic locomotion controllers for bipedal robots. Going beyond focusing on a single locomotion skill, we develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. Our RL-based controller incorporates a novel dual-history architecture, utilizing both a long-term and short-term input/output (I/O) history of the robot. This control architecture, when trained through the proposed end-to-end RL approach, consistently outperforms other methods across a diverse range of skills in both simulation and the real world.The study also delves into the adaptivity and robustness introduced by the proposed RL system in developing locomotion controllers. We demonstrate that the proposed architecture can adapt to both time-invariant dynamics shifts and time-variant changes, such as contact events, by effectively using the robot's I/O history. Additionally, we identify task randomization as another key source of robustness, fostering better task generalization and compliance to disturbances. The resulting control policies can be successfully deployed on Cassie, a torque-controlled human-sized bipedal robot. This work pushes the limits of agility for bipedal robots through extensive real-world experiments. We demonstrate a diverse range of locomotion skills, including: robust standing, versatile walking, fast running with a demonstration of a 400-meter dash, and a diverse set of jumping skills, such as standing long jumps and high jumps.
- [1700] arXiv:2401.16960 (cross-list from cs.CL) [ pdf , other ]
-
Title: Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity AlignmentSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Entity alignment, which is a prerequisite for creating a more comprehensive Knowledge Graph (KG), involves pinpointing equivalent entities across disparate KGs. Contemporary methods for entity alignment have predominantly utilized knowledge embedding models to procure entity embeddings that encapsulate various similarities-structural, relational, and attributive. These embeddings are then integrated through attention-based information fusion mechanisms. Despite this progress, effectively harnessing multifaceted information remains challenging due to inherent heterogeneity. Moreover, while Large Language Models (LLMs) have exhibited exceptional performance across diverse downstream tasks by implicitly capturing entity semantics, this implicit knowledge has yet to be exploited for entity alignment. In this study, we propose a Large Language Model-enhanced Entity Alignment framework (LLMEA), integrating structural knowledge from KGs with semantic knowledge from LLMs to enhance entity alignment. Specifically, LLMEA identifies candidate alignments for a given entity by considering both embedding similarities between entities across KGs and edit distances to a virtual equivalent entity. It then engages an LLM iteratively, posing multiple multi-choice questions to draw upon the LLM's inference capability. The final prediction of the equivalent entity is derived from the LLM's output. Experiments conducted on three public datasets reveal that LLMEA surpasses leading baseline models. Additional ablation studies underscore the efficacy of our proposed framework.
- [1701] arXiv:2401.16974 (cross-list from cs.LG) [ pdf , other ]
-
Title: CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement LearningComments: To be published In Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024), Auckland, New Zealand, May 6 - 10, 2024, IFAAMASSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Causal discovery is the challenging task of inferring causal structure from data. Motivated by Pearl's Causal Hierarchy (PCH), which tells us that passive observations alone are not enough to distinguish correlation from causation, there has been a recent push to incorporate interventions into machine learning research. Reinforcement learning provides a convenient framework for such an active approach to learning. This paper presents CORE, a deep reinforcement learning-based approach for causal discovery and intervention planning. CORE learns to sequentially reconstruct causal graphs from data while learning to perform informative interventions. Our results demonstrate that CORE generalizes to unseen graphs and efficiently uncovers causal structures. Furthermore, CORE scales to larger graphs with up to 10 variables and outperforms existing approaches in structure estimation accuracy and sample efficiency. All relevant code and supplementary material can be found at this https URL
- [1702] arXiv:2401.16982 (cross-list from cs.CR) [ pdf , other ]
-
Title: ActDroid: An active learning framework for Android malware detectionSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: The growing popularity of Android requires malware detection systems that can keep up with the pace of new software being released. According to a recent study, a new piece of malware appears online every 12 seconds. To address this, we treat Android malware detection as a streaming data problem and explore the use of active online learning as a means of mitigating the problem of labelling applications in a timely and cost-effective manner. Our resulting framework achieves accuracies of up to 96\%, requires as little of 24\% of the training data to be labelled, and compensates for concept drift that occurs between the release and labelling of an application. We also consider the broader practicalities of online learning within Android malware detection, and systematically explore the trade-offs between using different static, dynamic and hybrid feature sets to classify malware.
- [1703] arXiv:2401.17010 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: Finetuning Large Language Models for Vulnerability DetectionAuthors: Alexey Shestov , Rodion Levichev , Ravil Mussabayev , Evgeny Maslov , Anton Cheshkov , Pavel ZadorozhnySubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for vulnerability detection through further finetuning. To accelerate training, we modify WizardCoder's training procedure, also we investigate optimal training regimes. For the imbalanced dataset with many more negative examples than positive, we also explore different techniques to improve classification performance. The finetuned WizardCoder model achieves improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability datasets over CodeBERT-like model, demonstrating the effectiveness of adapting pretrained LLMs for vulnerability detection in source code. The key contributions are finetuning the state-of-the-art code LLM, WizardCoder, increasing its training speed without the performance harm, optimizing the training procedure and regimes, handling class imbalance, and improving performance on difficult vulnerability detection datasets. This demonstrates the potential for transfer learning by finetuning large pretrained language models for specialized source code analysis tasks.
- [1704] arXiv:2401.17050 (cross-list from cs.CV) [ pdf , other ]
-
Title: ViTree: Single-path Neural Tree for Step-wise Interpretable Fine-grained Visual CategorizationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: As computer vision continues to advance and finds widespread applications across various domains, the need for interpretability in deep learning models becomes paramount. Existing methods often resort to post-hoc techniques or prototypes to explain the decision-making process, which can be indirect and lack intrinsic illustration. In this research, we introduce ViTree, a novel approach for fine-grained visual categorization that combines the popular vision transformer as a feature extraction backbone with neural decision trees. By traversing the tree paths, ViTree effectively selects patches from transformer-processed features to highlight informative local regions, thereby refining representations in a step-wise manner. Unlike previous tree-based models that rely on soft distributions or ensembles of paths, ViTree selects a single tree path, offering a clearer and simpler decision-making process. This patch and path selectivity enhances model interpretability of ViTree, enabling better insights into the model's inner workings. Remarkably, extensive experimentation validates that this streamlined approach surpasses various strong competitors and achieves state-of-the-art performance while maintaining exceptional interpretability which is proved by multi-perspective methods. Code can be found at this https URL .
- [1705] arXiv:2401.17053 (cross-list from cs.CV) [ pdf , other ]
-
Title: BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane ExtrapolationAuthors: Zhennan Wu , Yang Li , Han Yan , Taizhang Shang , Weixuan Sun , Senbo Wang , Ruikai Cui , Weizhe Liu , Hiroyuki Sato , Hongdong Li , Pan JiComments: Video: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: We present BlockFusion, a diffusion-based model that generates 3D scenes as unit blocks and seamlessly incorporates new blocks to extend the scene. BlockFusion is trained using datasets of 3D blocks that are randomly cropped from complete 3D scene meshes. Through per-block fitting, all training blocks are converted into the hybrid neural fields: with a tri-plane containing the geometry features, followed by a Multi-layer Perceptron (MLP) for decoding the signed distance values. A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed. Diffusion applied to the latent representations allows for high-quality and diverse 3D scene generation. To expand a scene during generation, one needs only to append empty blocks to overlap with the current scene and extrapolate existing latent tri-planes to populate new blocks. The extrapolation is done by conditioning the generation process with the feature samples from the overlapping tri-planes during the denoising iterations. Latent tri-plane extrapolation produces semantically and geometrically meaningful transitions that harmoniously blend with the existing scene. A 2D layout conditioning mechanism is used to control the placement and arrangement of scene elements. Experimental results indicate that BlockFusion is capable of generating diverse, geometrically consistent and unbounded large 3D scenes with unprecedented high-quality shapes in both indoor and outdoor scenarios.
- [1706] arXiv:2401.17095 (cross-list from cs.LG) [ pdf , other ]
-
Title: Traffic estimation in unobserved network locations using data-driven macroscopic modelsComments: 34 pages, 28 figures, 6 tablesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This paper leverages macroscopic models and multi-source spatiotemporal data collected from automatic traffic counters and probe vehicles to accurately estimate traffic flow and travel time in links where these measurements are unavailable. This problem is critical in transportation planning applications where the sensor coverage is low and the planned interventions have network-wide impacts. The proposed model, named the Macroscopic Traffic Estimator (MaTE), can perform network-wide estimations of traffic flow and travel time only using the set of observed measurements of these quantities. Because MaTE is grounded in macroscopic flow theory, all parameters and variables are interpretable. The estimated traffic flow satisfies fundamental flow conservation constraints and exhibits an increasing monotonic relationship with the estimated travel time. Using logit-based stochastic traffic assignment as the principle for routing flow behavior makes the model fully differentiable with respect to the model parameters. This property facilitates the application of computational graphs to learn parameters from vast amounts of spatiotemporal data. We also integrate neural networks and polynomial kernel functions to capture link flow interactions and enrich the mapping of traffic flows into travel times. MaTE also adds a destination choice model and a trip generation model that uses historical data on the number of trips generated by location. Experiments on synthetic data show that the model can accurately estimate travel time and traffic flow in out-of-sample links. Results obtained using real-world multi-source data from a large-scale transportation network suggest that MaTE outperforms data-driven benchmarks, especially in travel time estimation. The estimated parameters of MaTE are also informative about the hourly change in travel demand and supply characteristics of the transportation network.
- [1707] arXiv:2401.17123 (cross-list from cs.LG) [ pdf , other ]
-
Title: Unsupervised Discovery of Steerable Factors When Graph Deep Generative Models Are EntangledAuthors: Shengchao Liu , Chengpeng Wang , Jiarui Lu , Weili Nie , Hanchen Wang , Zhuoxinran Li , Bolei Zhou , Jian TangSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Quantitative Methods (q-bio.QM)
Abstract: Deep generative models (DGMs) have been widely developed for graph data. However, much less investigation has been carried out on understanding the latent space of such pretrained graph DGMs. These understandings possess the potential to provide constructive guidelines for crucial tasks, such as graph controllable generation. Thus in this work, we are interested in studying this problem and propose GraphCG, a method for the unsupervised discovery of steerable factors in the latent space of pretrained graph DGMs. We first examine the representation space of three pretrained graph DGMs with six disentanglement metrics, and we observe that the pretrained representation space is entangled. Motivated by this observation, GraphCG learns the steerable factors via maximizing the mutual information between semantic-rich directions, where the controlled graph moving along the same direction will share the same steerable factors. We quantitatively verify that GraphCG outperforms four competitive baselines on two graph DGMs pretrained on two molecule datasets. Additionally, we qualitatively illustrate seven steerable factors learned by GraphCG on five pretrained DGMs over five graph datasets, including two for molecules and three for point clouds.
- [1708] arXiv:2401.17129 (cross-list from cs.SD) [ pdf , other ]
-
Title: Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapesSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: This technical report details our work towards building an enhanced audio-visual sound event localization and detection (SELD) network. We build on top of the audio-only SELDnet23 model and adapt it to be audio-visual by merging both audio and video information prior to the gated recurrent unit (GRU) of the audio-only network. Our model leverages YOLO and DETIC object detectors. We also build a framework that implements audio-visual data augmentation and audio-visual synthetic data generation. We deliver an audio-visual SELDnet system that outperforms the existing audio-visual SELD baseline.
- [1709] arXiv:2401.17133 (cross-list from cs.SD) [ pdf , other ]
-
Title: A Proactive and Dual Prevention Mechanism against Illegal Song Covers empowered by Singing Voice ConversionSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Abstract: Singing voice conversion (SVC) automates song covers by converting one singer's singing voice into another target singer's singing voice with the original lyrics and melody. However, it raises serious concerns about copyright and civil right infringements to multiple entities. This work proposes SongBsAb, the first proactive approach to mitigate unauthorized SVC-based illegal song covers. SongBsAb introduces human-imperceptible perturbations to singing voices before releasing them, so that when they are used, the generation process of SVC will be interfered, resulting in unexpected singing voices. SongBsAb features a dual prevention effect by causing both (singer) identity disruption and lyric disruption, namely, the SVC-covered singing voice neither imitates the target singer nor preserves the original lyrics. To improve the imperceptibility of perturbations, we refine a psychoacoustic model-based loss with the backing track as an additional masker, a unique accompanying element for singing voices compared to ordinary speech voices. To enhance the transferability, we propose to utilize a frame-level interaction reduction-based loss. We demonstrate the prevention effectiveness, utility, and robustness of SongBsAb on three SVC models and two datasets using both objective and human study-based subjective metrics. Our work fosters an emerging research direction for mitigating illegal automated song covers.
- [1710] arXiv:2401.17139 (cross-list from cs.LG) [ pdf , other ]
-
Title: Large Language Model Evaluation via Matrix EntropySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Theory (cs.IT)
Abstract: Large language models (LLMs) have revolutionized the field of natural language processing, extending their strong capabilities into multi-modal domains. Thus, it is vital to define proper and diversified metrics for the evaluation of LLMs.
In this paper, we introduce matrix entropy, a novel metric rooted in information theory and geometry principles to quantify the data compression proficiency in LLMs. It reflects the model's ability to extract relevant information and eliminate unnecessary elements, thereby providing insight into the language model's intrinsic capability. Specifically, we demonstrate its applicability in both single-modal (language) and multi-modal settings. For language models, our findings reveal that the matrix entropy of representations follows a scaling law type reduction when the model scales up, serving as a complement to the traditional loss scaling law. For the multi-modal setting, we also propose an evaluation method based on matrix entropy for assessing alignment quality and we find that modern large multi-modal models exhibit great alignment performance. - [1711] arXiv:2401.17169 (cross-list from cs.CL) [ pdf , other ]
-
Title: Conditional and Modal Reasoning in Large Language ModelsComments: 11 pages, 16 figures. Code and data available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: The reasoning abilities of large language models (LLMs) are the topic of a growing body of research in artificial intelligence and cognitive science. In this paper, we probe the extent to which a dozen LLMs are able to distinguish logically correct inferences from logically fallacious ones. We focus on inference patterns involving conditionals (e.g., 'If Ann has a queen, then Bob has a jack') and epistemic modals (e.g., 'Ann might have an ace', 'Bob must have a king'). These inference patterns have been of special interest to logicians, philosophers, and linguists, since they plausibly play a central role in human reasoning. Assessing LLMs on these inference patterns is thus highly relevant to the question of how much the reasoning abilities of LLMs match those of humans. Among the LLMs we tested, all but GPT-4 often make basic mistakes with conditionals. Moreover, even GPT-4 displays logically inconsistent judgments across inference patterns involving epistemic modals.
- [1712] arXiv:2401.17173 (cross-list from cs.LG) [ pdf , other ]
-
Title: Zero-Shot Reinforcement Learning via Function EncodersSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Although reinforcement learning (RL) can solve many challenging sequential decision making problems, achieving zero-shot transfer across related tasks remains a challenge. The difficulty lies in finding a good representation for the current task so that the agent understands how it relates to previously seen tasks. To achieve zero-shot transfer, we introduce the function encoder, a representation learning algorithm which represents a function as a weighted combination of learned, non-linear basis functions. By using a function encoder to represent the reward function or the transition function, the agent has information on how the current task relates to previously seen tasks via a coherent vector representation. Thus, the agent is able to achieve transfer between related tasks at run time with no additional training. We demonstrate state-of-the-art data efficiency, asymptotic performance, and training stability in three RL fields by augmenting basic RL algorithms with a function encoder task representation.
- [1713] arXiv:2401.17178 (cross-list from cs.LG) [ pdf , other ]
-
Title: GraphViz2Vec: A Structure-aware Feature Generation Model to Improve Classification in GNNsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
Abstract: GNNs are widely used to solve various tasks including node classification and link prediction. Most of the GNN architectures assume the initial embedding to be random or generated from popular distributions. These initial embeddings require multiple layers of transformation to converge into a meaningful latent representation. While number of layers allow accumulation of larger neighbourhood of a node it also introduce the problem of over-smoothing. In addition, GNNs are inept at representing structural information. For example, the output embedding of a node does not capture its triangles participation. In this paper, we presented a novel feature extraction methodology GraphViz2Vec that can capture the structural information of a node's local neighbourhood to create meaningful initial embeddings for a GNN model. These initial embeddings helps existing models achieve state-of-the-art results in various classification tasks. Further, these initial embeddings help the model to produce desired results with only two layers which in turn reduce the problem of over-smoothing. The initial encoding of a node is obtained from an image classification model trained on multiple energy diagrams of its local neighbourhood. These energy diagrams are generated with the induced sub-graph of the nodes traversed by multiple random walks. The generated encodings increase the performance of existing models on classification tasks (with a mean increase of $4.65\%$ and $2.58\%$ for the node and link classification tasks, respectively), with some models achieving state-of-the-art results.
- [1714] arXiv:2401.17186 (cross-list from cs.CV) [ pdf , other ]
-
Title: Embracing Language Inclusivity and Diversity in CLIP through Continual Language LearningComments: Accepted by AAAI'2024, 15 pages (with appendix), 7 figures, 10 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: While vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, their mastery in a few languages like English restricts their applicability in broader communities. To this end, there is an increasing interest in developing multilingual VL models via a joint-learning setup, which, however, could be unrealistic due to expensive costs and data availability. In this work, we propose to extend VL-PTMs' language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF). We begin our study by introducing a model dubbed CLL-CLIP, which builds upon CLIP, a prevailing VL-PTM that has acquired image-English text alignment. Specifically, CLL-CLIP contains an expandable token embedding layer to handle linguistic differences. It solely trains token embeddings to improve memory stability and is optimized under cross-modal and cross-lingual objectives to learn the alignment between images and multilingual texts. To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training. We construct a CLL benchmark covering 36 languages based on MSCOCO and XM3600 datasets and then evaluate multilingual image-text retrieval performance. Extensive experiments verify the effectiveness of CLL-CLIP and show that our approach can boost CLL-CLIP, e.g., by 6.7% in text-to-image average Recall@1 on XM3600, and improve various state-of-the-art methods consistently. Our code and data are available at \url{ this https URL }.
- [1715] arXiv:2401.17188 (cross-list from cs.IT) [ pdf , other ]
-
Title: Nested Construction of Polar Codes via TransformersComments: 7 pages; 8 figuresSubjects: Information Theory (cs.IT) ; Artificial Intelligence (cs.AI)
Abstract: Tailoring polar code construction for decoding algorithms beyond successive cancellation has remained a topic of significant interest in the field. However, despite the inherent nested structure of polar codes, the use of sequence models in polar code construction is understudied. In this work, we propose using a sequence modeling framework to iteratively construct a polar code for any given length and rate under various channel conditions. Simulations show that polar codes designed via sequential modeling using transformers outperform both 5G-NR sequence and Density Evolution based approaches for both AWGN and Rayleigh fading channels.
- [1716] arXiv:2401.17200 (cross-list from cs.LG) [ pdf , other ]
-
Title: NormEnsembleXAI: Unveiling the Strengths and Weaknesses of XAI Ensemble TechniquesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: This paper presents a comprehensive comparative analysis of explainable artificial intelligence (XAI) ensembling methods. Our research brings three significant contributions. Firstly, we introduce a novel ensembling method, NormEnsembleXAI, that leverages minimum, maximum, and average functions in conjunction with normalization techniques to enhance interpretability. Secondly, we offer insights into the strengths and weaknesses of XAI ensemble methods. Lastly, we provide a library, facilitating the practical implementation of XAI ensembling, thus promoting the adoption of transparent and interpretable deep learning models.
- [1717] arXiv:2401.17221 (cross-list from cs.CV) [ pdf , other ]
-
Title: MouSi: Poly-Visual-Expert Vision-Language ModelsAuthors: Xiaoran Fan , Tao Ji , Changhao Jiang , Shuo Li , Senjie Jin , Sirui Song , Junke Wang , Boyang Hong , Lu Chen , Guodong Zheng , Ming Zhang , Caishuang Huang , Rui Zheng , Zhiheng Xi , Yuhao Zhou , Shihan Dou , Junjie Ye , Hang Yan , Tao Gui , Qi Zhang , Xipeng Qiu , Xuanjing Huang , Zuxuan Wu , Yu-Gang JiangSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Current large vision-language models (VLMs) often encounter challenges such as insufficient capabilities of a single visual component and excessively long visual tokens. These issues can limit the model's effectiveness in accurately interpreting complex visual information and over-lengthy contextual information. Addressing these challenges is crucial for enhancing the performance and applicability of VLMs. This paper proposes the use of ensemble experts technique to synergizes the capabilities of individual visual encoders, including those skilled in image-text matching, OCR, image segmentation, etc. This technique introduces a fusion network to unify the processing of outputs from different visual experts, while bridging the gap between image encoders and pre-trained LLMs. In addition, we explore different positional encoding schemes to alleviate the waste of positional encoding caused by lengthy image feature sequences, effectively addressing the issue of position overflow and length limitations. For instance, in our implementation, this technique significantly reduces the positional occupancy in models like SAM, from a substantial 4096 to a more efficient and manageable 64 or even down to 1. Experimental results demonstrate that VLMs with multiple experts exhibit consistently superior performance over isolated visual encoders and mark a significant performance boost as more experts are integrated. We have open-sourced the training code used in this report. All of these resources can be found on our project website.
- [1718] arXiv:2401.17230 (cross-list from cs.SD) [ pdf , other ]
-
Title: ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf modelsAuthors: Jee-weon Jung , Wangyou Zhang , Jiatong Shi , Zakaria Aldeneh , Takuya Higuchi , Barry-John Theobald , Ahmed Hussen Abdelaziz , Shinji WatanabeComments: 5 pages, 3 figures, 7 tablesSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Abstract: This paper introduces ESPnet-SPK, a toolkit designed with several objectives for training speaker embedding extractors. First, we provide an open-source platform for researchers in the speaker recognition community to effortlessly build models. We provide several models, ranging from x-vector to recent SKA-TDNN. Through the modularized architecture design, variants can be developed easily. We also aspire to bridge developed models with other domains, facilitating the broad research community to effortlessly incorporate state-of-the-art embedding extractors. Pre-trained embedding extractors can be accessed in an off-the-shelf manner and we demonstrate the toolkit's versatility by showcasing its integration with two tasks. Another goal is to integrate with diverse self-supervised learning features. We release a reproducible recipe that achieves an equal error rate of 0.39% on the Vox1-O evaluation protocol using WavLM-Large with ECAPA-TDNN.
- [1719] arXiv:2401.17244 (cross-list from cs.CL) [ pdf , other ]
-
Title: LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and DistillationComments: 22 pages, 4 figuresSubjects: Computation and Language (cs.CL) ; Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI)
Abstract: Reducing hallucination of Large Language Models (LLMs) is imperative for use in the sciences where reproducibility is crucial. However, LLMs inherently lack long-term memory, making it a nontrivial, ad hoc, and inevitably biased task to fine-tune them on domain-specific literature and data. Here we introduce LLaMP, a multimodal retrieval-augmented generation (RAG) framework of multiple data-aware reasoning-and-acting (ReAct) agents that dynamically interact with computational and experimental data on Materials Project (MP). Without fine-tuning, LLaMP demonstrates an ability to comprehend and integrate various modalities of materials science concepts, fetch relevant data stores on the fly, process higher-order data (such as crystal structures and elastic tensors), and summarize multi-step procedures for solid-state synthesis. We show that LLaMP effectively corrects errors in GPT-3.5's intrinsic knowledge, reducing a 5.21% MAPE on frequently-documented bandgaps and a significant 1103.54% MAPE on formation energies -- errors that GPT-3.5 seems to derive from mixed data sources. Additionally, LLaMP substantially reduces the hallucinated volumetric strain in a diamond cubic silicon structure from 66.3% to 0. The proposed framework offers an intuitive and nearly hallucination-free approach to exploring materials informatics and establishes a pathway for knowledge distillation and fine-tuning other language models. We envision the framework as a valuable component for scientific hypotheses and a foundation for future autonomous laboratories where multiple LLM agents communicate and cooperate with robotics to drive material synthesis and chemical reactions without hard-coded human logic and intervention.
- [1720] arXiv:2401.17263 (cross-list from cs.LG) [ pdf , other ]
-
Title: Robust Prompt Optimization for Defending Language Models Against Jailbreaking AttacksComments: website and code available at this https URLSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Despite advances in AI alignment, language models (LM) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries modify input prompts to induce harmful behavior. While some defenses have been proposed, they focus on narrow threat models and fall short of a strong defense, which we posit should be effective, universal, and practical. To achieve this, we propose the first adversarial objective for defending LMs against jailbreaking attacks and an algorithm, robust prompt optimization (RPO), that uses gradient-based token optimization to enforce harmless outputs. This results in an easily accessible suffix that significantly improves robustness to both jailbreaks seen during optimization and unknown, held-out jailbreaks, reducing the attack success rate on Starling-7B from 84% to 8.66% across 20 jailbreaks. In addition, we find that RPO has a minor effect on benign use, is successful under adaptive attacks, and can transfer to black-box models, reducing the success rate of the strongest attack on GPT-4, GUARD, from 92% to 6%.
- [1721] arXiv:2401.17264 (cross-list from cs.SD) [ pdf , other ]
-
Title: Proactive Detection of Voice Cloning with Localized WatermarkingAuthors: Robin San Roman , Pierre Fernandez , Alexandre Défossez , Teddy Furon , Tuan Tran , Hady ElsaharComments: Code at this https URLSubjects: Sound (cs.SD) ; Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Abstract: In the rapidly evolving field of speech generative models, there is a pressing need to ensure audio authenticity against the risks of voice cloning. We present AudioSeal, the first audio watermarking technique designed specifically for localized detection of AI-generated speech. AudioSeal employs a generator/detector architecture trained jointly with a localization loss to enable localized watermark detection up to the sample level, and a novel perceptual loss inspired by auditory masking, that enables AudioSeal to achieve better imperceptibility. AudioSeal achieves state-of-the-art performance in terms of robustness to real life audio manipulations and imperceptibility based on automatic and human evaluation metrics. Additionally, AudioSeal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed - achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.
- [1722] arXiv:2401.17268 (cross-list from cs.CL) [ pdf , other ]
-
Title: Weaver: Foundation Models for Creative WritingAuthors: Tiannan Wang , Jiamin Chen , Qingrui Jia , Shuai Wang , Ruoyu Fang , Huilin Wang , Zhaowei Gao , Chunzhao Xie , Chuou Xu , Jihong Dai , Yibin Liu , Jialong Wu , Shengwei Ding , Long Li , Zhiwei Huang , Xinle Deng , Teng Yu , Gangan Ma , Han Xiao , Zixin Chen , Danjun Xiang , Yunxia Wang , Yuanyuan Zhu , Yi Xiao , Jing Wang , Yiru Wang , Siran Ding , Jiayang Huang , Jiayi Xu , Yilihamu Tayier , Zhenyu Hu , Yuan Gao , Chengfeng Zheng , Yueshu Ye , Yihang Li , Lei Wan , Xinyue Jiang , Yujie Wang , Siyu Cheng , Zhule Song , Xiangru Tang , Xiaohua Xu , Ningyu Zhang , Huajun Chen , Yuchen Eleanor Jiang , Wangchunshu ZhouSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for instruction data synthesis and LLM alignment, making it able to produce more human-like texts and follow more diverse instructions for content creation. The Weaver family consists of models of Weaver Mini (1.8B), Weaver Base (6B), Weaver Pro (14B), and Weaver Ultra (34B) sizes, suitable for different applications and can be dynamically dispatched by a routing agent according to query complexity to balance response quality and computation cost. Evaluation on a carefully curated benchmark for assessing the writing capabilities of LLMs shows Weaver models of all sizes outperform generalist LLMs several times larger than them. Notably, our most-capable Weaver Ultra model surpasses GPT-4, a state-of-the-art generalist LLM, on various writing scenarios, demonstrating the advantage of training specialized LLMs for writing purposes. Moreover, Weaver natively supports retrieval-augmented generation (RAG) and function calling (tool usage). We present various use cases of these abilities for improving AI-assisted writing systems, including integration of external knowledge bases, tools, or APIs, and providing personalized writing assistance. Furthermore, we discuss and summarize a guideline and best practices for pre-training and fine-tuning domain-specific LLMs.
- [1723] arXiv:2401.17319 (cross-list from cs.CR) [ pdf , other ]
-
Title: Decentralized Federated Learning: A Survey on Security and PrivacyComments: Accepted for publication in IEEE Transactions on Big DataJournal-ref: IEEE Transactions on Big Data, vol. 10, no. 2, pp. 194-213, 2024Subjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract: Federated learning has been rapidly evolving and gaining popularity in recent years due to its privacy-preserving features, among other advantages. Nevertheless, the exchange of model updates and gradients in this architecture provides new attack surfaces for malicious users of the network which may jeopardize the model performance and user and data privacy. For this reason, one of the main motivations for decentralized federated learning is to eliminate server-related threats by removing the server from the network and compensating for it through technologies such as blockchain. However, this advantage comes at the cost of challenging the system with new privacy threats. Thus, performing a thorough security analysis in this new paradigm is necessary. This survey studies possible variations of threats and adversaries in decentralized federated learning and overviews the potential defense mechanisms. Trustability and verifiability of decentralized federated learning are also considered in this study.
- [1724] arXiv:2401.17342 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Latent Space Metric for Enhancing Prediction Confidence in Earth Observation DataAuthors: Ioannis Pitsiorlas , Argyro Tsantalidou , George Arvanitakis , Marios Kountouris , Charalambos KontoesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: This study presents a new approach for estimating confidence in machine learning model predictions, specifically in regression tasks utilizing Earth Observation (EO) data, with a particular focus on mosquito abundance (MA) estimation. We take advantage of a Variational AutoEncoder architecture, to derive a confidence metric by the latent space representations of EO datasets. This methodology is pivotal in establishing a correlation between the Euclidean distance in latent representations and the Absolute Error (AE) in individual MA predictions. Our research focuses on EO datasets from the Veneto region in Italy and the Upper Rhine Valley in Germany, targeting areas significantly affected by mosquito populations. A key finding is a notable correlation of 0.46 between the AE of MA predictions and the proposed confidence metric. This correlation signifies a robust, new metric for quantifying the reliability and enhancing the trustworthiness of the AI model's predictions in the context of both EO data analysis and mosquito abundance studies.
- [1725] arXiv:2401.17343 (cross-list from cs.CV) [ pdf , other ]
-
Title: YTCommentQA: Video Question Answerability in Instructional VideosComments: AAAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Instructional videos provide detailed how-to guides for various tasks, with viewers often posing questions regarding the content. Addressing these questions is vital for comprehending the content, yet receiving immediate answers is difficult. While numerous computational models have been developed for Video Question Answering (Video QA) tasks, they are primarily trained on questions generated based on video content, aiming to produce answers from within the content. However, in real-world situations, users may pose questions that go beyond the video's informational boundaries, highlighting the necessity to determine if a video can provide the answer. Discerning whether a question can be answered by video content is challenging due to the multi-modal nature of videos, where visual and verbal information are intertwined. To bridge this gap, we present the YTCommentQA dataset, which contains naturally-generated questions from YouTube, categorized by their answerability and required modality to answer -- visual, script, or both. Experiments with answerability classification tasks demonstrate the complexity of YTCommentQA and emphasize the need to comprehend the combined role of visual and script information in video reasoning. The dataset is available at this https URL .
- [1726] arXiv:2401.17350 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: Time Series Supplier Allocation via Deep Black-Litterman ModelAuthors: Jiayuan Luo , Wentao Zhang , Yuchen Fang , Xiaowei Gao , Dingyi Zhuang , Hao Chen , Xinke JiangComments: In submission to SIGKDD 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Time Series Supplier Allocation (TSSA) poses a complex NP-hard challenge, aimed at refining future order dispatching strategies to satisfy order demands with maximum supply efficiency fully. Traditionally derived from financial portfolio management, the Black-Litterman (BL) model offers a new perspective for the TSSA scenario by balancing expected returns against insufficient supply risks. However, its application within TSSA is constrained by the reliance on manually constructed perspective matrices and spatio-temporal market dynamics, coupled with the absence of supervisory signals and data unreliability inherent to supplier information. To solve these limitations, we introduce the pioneering Deep Black-Litterman Model (DBLM), which innovatively adapts the BL model from financial roots to supply chain context. Leveraging the Spatio-Temporal Graph Neural Networks (STGNNS), DBLM automatically generates future perspective matrices for TSSA, by integrating spatio-temporal dependency. Moreover, a novel Spearman rank correlation distinctively supervises our approach to address the lack of supervisory signals, specifically designed to navigate through the complexities of supplier risks and interactions. This is further enhanced by a masking mechanism aimed at counteracting the biases from unreliable data, thereby improving the model's precision and reliability. Extensive experimentation on two datasets unequivocally demonstrates DBLM's enhanced performance in TSSA, setting new standards for the field. Our findings and methodology are made available for community access and further development.
- [1727] arXiv:2401.17373 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Arabic Tweet Act: A Weighted Ensemble Pre-Trained Transformer Model for Classifying Arabic Speech Acts on TwitterComments: 16 pages, 6 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Speech acts are a speakers actions when performing an utterance within a conversation, such as asking, recommending, greeting, or thanking someone, expressing a thought, or making a suggestion. Understanding speech acts helps interpret the intended meaning and actions behind a speakers or writers words. This paper proposes a Twitter dialectal Arabic speech act classification approach based on a transformer deep learning neural network. Twitter and social media, are becoming more and more integrated into daily life. As a result, they have evolved into a vital source of information that represents the views and attitudes of their users. We proposed a BERT based weighted ensemble learning approach to integrate the advantages of various BERT models in dialectal Arabic speech acts classification. We compared the proposed model against several variants of Arabic BERT models and sequence-based models. We developed a dialectal Arabic tweet act dataset by annotating a subset of a large existing Arabic sentiment analysis dataset (ASAD) based on six speech act categories. We also evaluated the models on a previously developed Arabic Tweet Act dataset (ArSAS). To overcome the class imbalance issue commonly observed in speech act problems, a transformer-based data augmentation model was implemented to generate an equal proportion of speech act categories. The results show that the best BERT model is araBERTv2-Twitter models with a macro-averaged F1 score and an accuracy of 0.73 and 0.84, respectively. The performance improved using a BERT-based ensemble method with a 0.74 and 0.85 averaged F1 score and accuracy on our dataset, respectively.
- [1728] arXiv:2401.17377 (cross-list from cs.CL) [ pdf , other ]
-
Title: Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion TokensSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: Are $n$-gram language models still relevant in this era of neural large language models (LLMs)? Our answer is yes, and we showcase their values in both text analysis and improving neural LLMs. This was done by modernizing $n$-gram LMs in two aspects. First, we train them at the same data scale as neural LLMs -- 5 trillion tokens. This is the largest $n$-gram LM ever built. Second, existing $n$-gram LMs use small $n$ which hinders their performance; we instead allow $n$ to be arbitrarily large, by introducing a new $\infty$-gram LM with backoff. Instead of pre-computing $n$-gram count tables (which would be very expensive), we develop an engine named infini-gram -- powered by suffix arrays -- that can compute $\infty$-gram (as well as $n$-gram with arbitrary $n$) probabilities with millisecond-level latency. The $\infty$-gram framework and infini-gram engine enable us to conduct many novel and interesting analyses of human-written and machine-generated text: we find that the $\infty$-gram LM has fairly high accuracy for next-token prediction (47%), and can complement neural LLMs to greatly reduce their perplexity. When analyzing machine-generated text, we also observe irregularities in the machine--$\infty$-gram agreement level with respect to the suffix length, which indicates deficiencies in neural LLM pretraining and the positional embeddings of Transformers.
- [1729] arXiv:2401.17390 (cross-list from cs.CL) [ pdf , other ]
-
Title: Customizing Language Model Responses with Contrastive In-Context LearningComments: Accepted to appear at AAAI 2024Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLMs) are becoming increasingly important for machine learning applications. However, it can be challenging to align LLMs with our intent, particularly when we want to generate content that is preferable over others or when we want the LLM to respond in a certain style or tone that is hard to describe. To address this challenge, we propose an approach that uses contrastive examples to better describe our intent. This involves providing positive examples that illustrate the true intent, along with negative examples that show what characteristics we want LLMs to avoid. The negative examples can be retrieved from labeled data, written by a human, or generated by the LLM itself. Before generating an answer, we ask the model to analyze the examples to teach itself what to avoid. This reasoning step provides the model with the appropriate articulation of the user's need and guides it towards generting a better answer. We tested our approach on both synthesized and real-world datasets, including StackExchange and Reddit, and found that it significantly improves performance compared to standard few-shot prompting
- [1730] arXiv:2401.17396 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Fine-tuning Transformer-based Encoder for Turkish Language Understanding TasksAuthors: Savas YildirimSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Deep learning-based and lately Transformer-based language models have been dominating the studies of natural language processing in the last years. Thanks to their accurate and fast fine-tuning characteristics, they have outperformed traditional machine learning-based approaches and achieved state-of-the-art results for many challenging natural language understanding (NLU) problems. Recent studies showed that the Transformer-based models such as BERT, which is Bidirectional Encoder Representations from Transformers, have reached impressive achievements on many tasks. Moreover, thanks to their transfer learning capacity, these architectures allow us to transfer pre-built models and fine-tune them to specific NLU tasks such as question answering. In this study, we provide a Transformer-based model and a baseline benchmark for the Turkish Language. We successfully fine-tuned a Turkish BERT model, namely BERTurk that is trained with base settings, to many downstream tasks and evaluated with a the Turkish Benchmark dataset. We showed that our studies significantly outperformed other existing baseline approaches for Named-Entity Recognition, Sentiment Analysis, Question Answering and Text Classification in Turkish Language. We publicly released these four fine-tuned models and resources in reproducibility and with the view of supporting other Turkish researchers and applications.
- [1731] arXiv:2401.17401 (cross-list from cs.LG) [ pdf , other ]
-
Title: Step-size Optimization for Continual LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: In continual learning, a learner has to keep learning from the data over its whole life time. A key issue is to decide what knowledge to keep and what knowledge to let go. In a neural network, this can be implemented by using a step-size vector to scale how much gradient samples change network weights. Common algorithms, like RMSProp and Adam, use heuristics, specifically normalization, to adapt this step-size vector. In this paper, we show that those heuristics ignore the effect of their adaptation on the overall objective function, for example by moving the step-size vector away from better step-size vectors. On the other hand, stochastic meta-gradient descent algorithms, like IDBD (Sutton, 1992), explicitly optimize the step-size vector with respect to the overall objective function. On simple problems, we show that IDBD is able to consistently improve step-size vectors, where RMSProp and Adam do not. We explain the differences between the two approaches and their respective limitations. We conclude by suggesting that combining both approaches could be a promising future direction to improve the performance of neural networks in continual learning.
- [1732] arXiv:2401.17417 (cross-list from cs.CV) [ pdf , other ]
-
Title: Through-Wall Imaging based on WiFi Channel State InformationSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This work presents a seminal approach for synthesizing images from WiFi Channel State Information (CSI) in through-wall scenarios. Leveraging the strengths of WiFi, such as cost-effectiveness, illumination invariance, and wall-penetrating capabilities, our approach enables visual monitoring of indoor environments beyond room boundaries and without the need for cameras. More generally, it improves the interpretability of WiFi CSI by unlocking the option to perform image-based downstream tasks, e.g., visual activity recognition. In order to achieve this crossmodal translation from WiFi CSI to images, we rely on a multimodal Variational Autoencoder (VAE) adapted to our problem specifics. We extensively evaluate our proposed methodology through an ablation study on architecture configuration and a quantitative/qualitative assessment of reconstructed images. Our results demonstrate the viability of our method and highlight its potential for practical applications.
- [1733] arXiv:2401.17426 (cross-list from cs.LG) [ pdf , other ]
-
Title: Superiority of Multi-Head Attention in In-Context Linear RegressionSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: We present a theoretical analysis of the performance of transformer with softmax attention in in-context learning with linear regression tasks. While the existing literature predominantly focuses on the convergence of transformers with single-/multi-head attention, our research centers on comparing their performance. We conduct an exact theoretical analysis to demonstrate that multi-head attention with a substantial embedding dimension performs better than single-head attention. When the number of in-context examples D increases, the prediction loss using single-/multi-head attention is in O(1/D), and the one for multi-head attention has a smaller multiplicative constant. In addition to the simplest data distribution setting, we consider more scenarios, e.g., noisy labels, local examples, correlated features, and prior knowledge. We observe that, in general, multi-head attention is preferred over single-head attention. Our results verify the effectiveness of the design of multi-head attention in the transformer architecture.
- [1734] arXiv:2401.17434 (cross-list from cs.CY) [ pdf , ps , other ]
-
Title: Integrating Generative AI in Hackathons: Opportunities, Challenges, and Educational ImplicationsAuthors: Ramteja Sajja , Carlos Erazo Ramirez , Zhouyayan Li , Bekir Z. Demiray , Yusuf Sermet , Ibrahim DemirComments: 8491 words, 23 pages, 12 figuresSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Abstract: Hackathons and software competitions, increasingly pivotal in the software industry, serve as vital catalysts for innovation and skill development for both organizations and students. These platforms enable companies to prototype ideas swiftly, while students gain enriched learning experiences, enhancing their practical skills. Over the years, hackathons have transitioned from mere competitive events to significant educational tools, fusing theoretical knowledge with real-world problem-solving. The integration of hackathons into computer science and software engineering curricula aims to align educational proficiencies within a collaborative context, promoting peer connectivity and enriched learning via industry-academia collaborations. However, the infusion of advanced technologies, notably artificial intelligence (AI), and machine learning, into hackathons is revolutionizing their structure and outcomes. This evolution brings forth both opportunities, like enhanced learning experiences, and challenges, such as ethical concerns. This study delves into the impact of generative AI, examining its influence on student's technological choices based on a case study on the University of Iowa 2023 event. The exploration provides insights into AI's role in hackathons, and its educational implications, and offers a roadmap for the integration of such technologies in future events, ensuring innovation is balanced with ethical and educational considerations.
- [1735] arXiv:2401.17435 (cross-list from cs.LG) [ pdf , other ]
-
Title: Can Large Language Models Replace Economic Choice Prediction Labs?Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC)
Abstract: Economic choice prediction is an essential challenging task, often constrained by the difficulties in acquiring human choice data. Indeed, experimental economics studies had focused mostly on simple choice settings. The AI community has recently contributed to that effort in two ways: considering whether LLMs can substitute for humans in the above-mentioned simple choice prediction settings, and the study through ML lens of more elaborated but still rigorous experimental economics settings, employing incomplete information, repetitive play, and natural language communication, notably language-based persuasion games. This leaves us with a major inspiration: can LLMs be used to fully simulate the economic environment and generate data for efficient human choice prediction, substituting for the elaborated economic lab studies? We pioneer the study of this subject, demonstrating its feasibility. In particular, we show that a model trained solely on LLM-generated data can effectively predict human behavior in a language-based persuasion game, and can even outperform models trained on actual human data.
- [1736] arXiv:2401.17441 (cross-list from cs.LG) [ pdf , other ]
-
Title: Explaining Predictive Uncertainty by Exposing Second-Order EffectsComments: 12 pages + supplementSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Explainable AI has brought transparency into complex ML blackboxes, enabling, in particular, to identify which features these models use for their predictions. So far, the question of explaining predictive uncertainty, i.e. why a model 'doubts', has been scarcely studied. Our investigation reveals that predictive uncertainty is dominated by second-order effects, involving single features or product interactions between them. We contribute a new method for explaining predictive uncertainty based on these second-order effects. Computationally, our method reduces to a simple covariance computation over a collection of first-order explanations. Our method is generally applicable, allowing for turning common attribution techniques (LRP, Gradient x Input, etc.) into powerful second-order uncertainty explainers, which we call CovLRP, CovGI, etc. The accuracy of the explanations our method produces is demonstrated through systematic quantitative evaluations, and the overall usefulness of our method is demonstrated via two practical showcases.
- [1737] arXiv:2401.17443 (cross-list from cs.LG) [ pdf , other ]
-
Title: Liquid Democracy for Low-Cost Ensemble PruningComments: 30 pages, 20 figures. Extended abstract to appear at AAMAS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Abstract: We argue that there is a strong connection between ensemble learning and a delegative voting paradigm -- liquid democracy -- that can be leveraged to reduce ensemble training costs. We present an incremental training procedure that identifies and removes redundant classifiers from an ensemble via delegation mechanisms inspired by liquid democracy. Through both analysis and extensive experiments we show that this process greatly reduces the computational cost of training compared to training a full ensemble. By carefully selecting the underlying delegation mechanism, weight centralization in the classifier population is avoided, leading to higher accuracy than some boosting methods. Furthermore, this work serves as an exemplar of how frameworks from computational social choice literature can be applied to problems in nontraditional domains.
- [1738] arXiv:2401.17459 (cross-list from cs.CR) [ pdf , ps , other ]
-
Title: A Preliminary Study on Using Large Language Models in Software PentestingAuthors: Kumar Shashwat , Francis Hahn , Xinming Ou , Dmitry Goldgof , Lawrence Hall , Jay Ligatti , S. Raj Rajgopalan , Armin Ziaie TabariSubjects: Cryptography and Security (cs.CR) ; Artificial Intelligence (cs.AI)
Abstract: Large language models (LLM) are perceived to offer promising potentials for automating security tasks, such as those found in security operation centers (SOCs). As a first step towards evaluating this perceived potential, we investigate the use of LLMs in software pentesting, where the main task is to automatically identify software security vulnerabilities in source code. We hypothesize that an LLM-based AI agent can be improved over time for a specific security task as human operators interact with it. Such improvement can be made, as a first step, by engineering prompts fed to the LLM based on the responses produced, to include relevant contexts and structures so that the model provides more accurate results. Such engineering efforts become sustainable if the prompts that are engineered to produce better results on current tasks, also produce better results on future unknown tasks. To examine this hypothesis, we utilize the OWASP Benchmark Project 1.2 which contains 2,740 hand-crafted source code test cases containing various types of vulnerabilities. We divide the test cases into training and testing data, where we engineer the prompts based on the training data (only), and evaluate the final system on the testing data. We compare the AI agent's performance on the testing data against the performance of the agent without the prompt engineering. We also compare the AI agent's results against those from SonarQube, a widely used static code analyzer for security testing. We built and tested multiple versions of the AI agent using different off-the-shelf LLMs -- Google's Gemini-pro, as well as OpenAI's GPT-3.5-Turbo and GPT-4-Turbo (with both chat completion and assistant APIs). The results show that using LLMs is a viable approach to build an AI agent for software pentesting that can improve through repeated use and prompt engineering.
- [1739] arXiv:2401.17461 (cross-list from cs.CL) [ pdf , other ]
-
Title: Synthetic Dialogue Dataset Generation using LLM AgentsComments: GEM Workshop @ EMNLP 2023Subjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Linear programming (LP) problems are pervasive in real-life applications. However, despite their apparent simplicity, an untrained user may find it difficult to determine the linear model of their specific problem. We envisage the creation of a goal-oriented conversational agent that will engage in conversation with the user to elicit all information required so that a subsequent agent can generate the linear model. In this paper, we present an approach for the generation of sample dialogues that can be used to develop and train such a conversational agent. Using prompt engineering, we develop two agents that "talk" to each other, one acting as the conversational agent, and the other acting as the user. Using a set of text descriptions of linear problems from NL4Opt available to the user only, the agent and the user engage in conversation until the agent has retrieved all key information from the original problem description. We also propose an extrinsic evaluation of the dialogues by assessing how well the summaries generated by the dialogues match the original problem descriptions. We conduct human and automatic evaluations, including an evaluation approach that uses GPT-4 to mimic the human evaluation metrics. The evaluation results show an overall good quality of the dialogues, though research is still needed to improve the quality of the GPT-4 evaluation metrics. The resulting dialogues, including the human annotations of a subset, are available to the research community. The conversational agent used for the generation of the dialogues can be used as a baseline.
- [1740] arXiv:2401.17477 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Detecting mental disorder on social media: a ChatGPT-augmented explainable approachSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
Abstract: In the digital era, the prevalence of depressive symptoms expressed on social media has raised serious concerns, necessitating advanced methodologies for timely detection. This paper addresses the challenge of interpretable depression detection by proposing a novel methodology that effectively combines Large Language Models (LLMs) with eXplainable Artificial Intelligence (XAI) and conversational agents like ChatGPT. In our methodology, explanations are achieved by integrating BERTweet, a Twitter-specific variant of BERT, into a novel self-explanatory model, namely BERT-XDD, capable of providing both classification and explanations via masked attention. The interpretability is further enhanced using ChatGPT to transform technical explanations into human-readable commentaries. By introducing an effective and modular approach for interpretable depression detection, our methodology can contribute to the development of socially responsible digital platforms, fostering early intervention and support for mental health challenges under the guidance of qualified healthcare professionals.
- [1741] arXiv:2401.17500 (cross-list from cs.RO) [ pdf , other ]
-
Title: LeTO: Learning Constrained Visuomotor Policy with Differentiable Trajectory OptimizationComments: 8 pages, 5 figuresSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: This paper introduces LeTO, a method for learning constrained visuomotor policy via differentiable trajectory optimization. Our approach uniquely integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and controlled fashion without extra modules. Our method allows for the introduction of constraints information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This "gray box" method marries the optimization-based safety and interpretability with the powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and on the real robot. In simulation, LeTO achieves a success rate comparable to state-of-the-art imitation learning methods, but the generated trajectories are of less uncertainty, higher quality, and smoother. In real-world experiments, we deployed LeTO to handle constraints-critical tasks. The results show the effectiveness of LeTO comparing with state-of-the-art imitation learning approaches. We release our code at this https URL .
- [1742] arXiv:2401.17505 (cross-list from cs.LG) [ pdf , other ]
-
Title: Arrows of Time for Large Language ModelsComments: Updated 1 figure, minor corrections to text. 11 figures, 16 pagesSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: We study the probabilistic modeling performed by Autoregressive Large Language Models through the angle of time directionality. We empirically find a time asymmetry exhibited by such models in their ability to model natural language: a difference in the average log-perplexity when trying to predict the next token versus when trying to predict the previous one. This difference is at the same time subtle and very consistent across various modalities (language, model size, training time, ...). Theoretically, this is surprising: from an information-theoretic point of view, there should be no such difference. We provide a theoretical framework to explain how such an asymmetry can appear from sparsity and computational complexity considerations, and outline a number of perspectives opened by our results.
- [1743] arXiv:2401.17542 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation ModelsSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Foundation models, pre-trained on massive datasets, have achieved unprecedented generalizability. However, is it truly necessary to involve such vast amounts of data in pre-training, consuming extensive computational resources? This paper introduces data-effective learning, aiming to use data in the most impactful way to pre-train foundation models. This involves strategies that focus on data quality rather than quantity, ensuring the data used for training has high informational value. Data-effective learning plays a profound role in accelerating foundation model training, reducing computational costs, and saving data storage, which is very important as the volume of medical data in recent years has grown beyond many people's expectations. However, due to the lack of standards and comprehensive benchmarks, research on medical data-effective learning is poorly studied. To address this gap, our paper introduces a comprehensive benchmark specifically for evaluating data-effective learning in the medical field. This benchmark includes a dataset with millions of data samples from 31 medical centers (DataDEL), a baseline method for comparison (MedDEL), and a new evaluation metric (NormDEL) to objectively measure data-effective learning performance. Our extensive experimental results show the baseline MedDEL can achieve performance comparable to the original large dataset with only 5% of the data. Establishing such an open data-effective learning benchmark is crucial for the medical foundation model research community because it facilitates efficient data use, promotes collaborative breakthroughs, and fosters the development of cost-effective, scalable, and impactful healthcare solutions.
- [1744] arXiv:2401.17548 (cross-list from cs.LG) [ pdf , other ]
-
Title: Rethinking Channel Dependence for Multivariate Time Series Forecasting: Learning from Leading IndicatorsComments: Accepted to ICLR 2024. Code is at this https URLJournal-ref: The Twelfth International Conference on Learning Representations, 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recently, channel-independent methods have achieved state-of-the-art performance in multivariate time series (MTS) forecasting. Despite reducing overfitting risks, these methods miss potential opportunities in utilizing channel dependence for accurate predictions. We argue that there exist locally stationary lead-lag relationships between variates, i.e., some lagged variates may follow the leading indicators within a short time period. Exploiting such channel dependence is beneficial since leading indicators offer advance information that can be used to reduce the forecasting difficulty of the lagged variates. In this paper, we propose a new method named LIFT that first efficiently estimates leading indicators and their leading steps at each time step and then judiciously allows the lagged variates to utilize the advance information from leading indicators. LIFT plays as a plugin that can be seamlessly collaborated with arbitrary time series forecasting methods. Extensive experiments on six real-world datasets demonstrate that LIFT improves the state-of-the-art methods by 5.5% in average forecasting performance. Our code is available at this https URL .
- [1745] arXiv:2401.17580 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Contrastive Learning with Cohesive Subgraph AwarenessSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Graph contrastive learning (GCL) has emerged as a state-of-the-art strategy for learning representations of diverse graphs including social and biomedical networks. GCL widely uses stochastic graph topology augmentation, such as uniform node dropping, to generate augmented graphs. However, such stochastic augmentations may severely damage the intrinsic properties of a graph and deteriorate the following representation learning process. We argue that incorporating an awareness of cohesive subgraphs during the graph augmentation and learning processes has the potential to enhance GCL performance. To this end, we propose a novel unified framework called CTAug, to seamlessly integrate cohesion awareness into various existing GCL mechanisms. In particular, CTAug comprises two specialized modules: topology augmentation enhancement and graph learning enhancement. The former module generates augmented graphs that carefully preserve cohesion properties, while the latter module bolsters the graph encoder's ability to discern subgraph patterns. Theoretical analysis shows that CTAug can strictly improve existing GCL mechanisms. Empirical experiments verify that CTAug can achieve state-of-the-art performance for graph representation learning, especially for graphs with high degrees. The code is available at this https URL , or this https URL .
- [1746] arXiv:2401.17583 (cross-list from cs.RO) [ pdf , other ]
-
Title: Agile But Safe: Learning Collision-Free High-Speed Legged LocomotionComments: Project website: this https URLSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Abstract: Legged robots navigating cluttered environments must be jointly agile for efficient task execution and safe to avoid collisions with obstacles or humans. Existing studies either develop conservative controllers (< 1.0 m/s) to ensure safety, or focus on agility without considering potentially fatal collisions. This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles.
- [1747] arXiv:2401.17585 (cross-list from cs.CL) [ pdf , other ]
-
Title: Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual TasksComments: 22 pages, 14 figures, 5 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Methodology (stat.ME)
Abstract: Current approaches of knowledge editing struggle to effectively propagate updates to interconnected facts. In this work, we delve into the barriers that hinder the appropriate propagation of updated knowledge within these models for accurate reasoning. To support our analysis, we introduce a novel reasoning-based benchmark -- ReCoE (Reasoning-based Counterfactual Editing dataset) -- which covers six common reasoning schemes in real world. We conduct a thorough analysis of existing knowledge editing techniques, including input augmentation, finetuning, and locate-and-edit. We found that all model editing methods show notably low performance on this dataset, especially in certain reasoning schemes. Our analysis over the chain-of-thought generation of edited models further uncover key reasons behind the inadequacy of existing knowledge editing methods from a reasoning standpoint, involving aspects on fact-wise editing, fact recall ability, and coherence in generation. We will make our benchmark publicly available.
- [1748] arXiv:2401.17592 (cross-list from cs.CV) [ pdf , other ]
-
Title: Local Feature Matching Using Deep Learning: A SurveyComments: Accepted by Information Fusion 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Local feature matching enjoys wide-ranging applications in the realm of computer vision, encompassing domains such as image retrieval, 3D reconstruction, and object recognition. However, challenges persist in improving the accuracy and robustness of matching due to factors like viewpoint and lighting variations. In recent years, the introduction of deep learning models has sparked widespread exploration into local feature matching techniques. The objective of this endeavor is to furnish a comprehensive overview of local feature matching methods. These methods are categorized into two key segments based on the presence of detectors. The Detector-based category encompasses models inclusive of Detect-then-Describe, Joint Detection and Description, Describe-then-Detect, as well as Graph Based techniques. In contrast, the Detector-free category comprises CNN Based, Transformer Based, and Patch Based methods. Our study extends beyond methodological analysis, incorporating evaluations of prevalent datasets and metrics to facilitate a quantitative comparison of state-of-the-art techniques. The paper also explores the practical application of local feature matching in diverse domains such as Structure from Motion, Remote Sensing Image Registration, and Medical Image Registration, underscoring its versatility and significance across various fields. Ultimately, we endeavor to outline the current challenges faced in this domain and furnish future research directions, thereby serving as a reference for researchers involved in local feature matching and its interconnected domains. A comprehensive list of studies in this survey is available at this https URL .
- [1749] arXiv:2401.17600 (cross-list from cs.CL) [ pdf , other ]
-
Title: Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation dataComments: 62 pages; work in progressSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Large Vision-Language Models (VLMs) have demonstrated impressive performance on complex tasks involving visual input with natural language instructions. However, it remains unclear to what extent capabilities on natural images transfer to Earth observation (EO) data, which are predominantly satellite and aerial images less common in VLM training data. In this work, we propose a comprehensive benchmark to gauge the progress of VLMs toward being useful tools for EO data by assessing their abilities on scene understanding, localization and counting, and change detection tasks. Motivated by real-world applications, our benchmark includes scenarios like urban monitoring, disaster relief, land use, and conservation. We discover that, although state-of-the-art VLMs like GPT-4V possess extensive world knowledge that leads to strong performance on open-ended tasks like location understanding and image captioning, their poor spatial reasoning limits usefulness on object localization and counting tasks. Our benchmark will be made publicly available at this https URL and on Hugging Face at this https URL for easy model evaluation.
- [1750] arXiv:2401.17617 (cross-list from cs.CV) [ pdf , other ]
-
Title: Unveiling the Power of Self-supervision for Multi-view Multi-human Association and TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self learning. In this work, we tackle this problem with a self-supervised learning aware end-to-end network. Specifically, we propose to take advantage of the spatial-temporal self-consistency rationale by considering three properties of reflexivity, symmetry and transitivity. Besides the reflexivity property that naturally holds, we design the self-supervised learning losses based on the properties of symmetry and transitivity, for both appearance feature learning and assignment matrix optimization, to associate the multiple humans over time and across views. Furthermore, to promote the research on MvMHAT, we build two new large-scale benchmarks for the network training and testing of different algorithms. Extensive experiments on the proposed benchmarks verify the effectiveness of our method. We have released the benchmark and code to the public.
- [1751] arXiv:2401.17626 (cross-list from cs.SE) [ pdf , ps , other ]
-
Title: Generative AI to Generate Test Data GeneratorsAuthors: Benoit Baudry , Khashayar Etemadi , Sen Fang , Yogya Gamage , Yi Liu , Yuxin Liu , Martin Monperrus , Javier Ron , André Silva , Deepika TiwariSubjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design three types of prompts for Large Language Models (LLMs), which perform test data generation tasks at different levels of integrability: 1) raw test data generation, 2) synthesizing programs in a specific language that generate useful test data, and 3) producing programs that use state-of-the-art faker libraries. We evaluate our approach by prompting LLMs to generate test data for 11 domains. The results show that LLMs can successfully generate realistic test data generators in a wide range of domains at all three levels of integrability.
- [1752] arXiv:2401.17633 (cross-list from cs.CL) [ pdf , other ]
-
Title: Navigating the OverKill in Large Language ModelsAuthors: Chenyu Shi , Xiao Wang , Qiming Ge , Songyang Gao , Xianjun Yang , Tao Gui , Qi Zhang , Xuanjing Huang , Xun Zhao , Dahua LinSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: Large language models are meticulously aligned to be both helpful and harmless. However, recent research points to a potential overkill which means models may refuse to answer benign queries. In this paper, we investigate the factors for overkill by exploring how models handle and determine the safety of queries. Our findings reveal the presence of shortcuts within models, leading to an over-attention of harmful words like 'kill' and prompts emphasizing safety will exacerbate overkill. Based on these insights, we introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy, to alleviate this phenomenon. We first extract such over-attention by amplifying the difference in the model's output distributions when responding to system prompts that either include or omit an emphasis on safety. Then we determine the final next-token predictions by downplaying the over-attention from the model via contrastive decoding. Empirical results indicate that our method has achieved an average reduction of the refusal rate by 20\% while having almost no impact on safety.
- [1753] arXiv:2401.17645 (cross-list from cs.IR) [ pdf , other ]
-
Title: ReSLLM: Large Language Models are Strong Resource Selectors for Federated SearchSubjects: Information Retrieval (cs.IR) ; Artificial Intelligence (cs.AI)
Abstract: Federated search, which involves integrating results from multiple independent search engines, will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines empowering LLM-based applications such as chatbots. These systems often distribute queries among various search engines, ranging from specialized (e.g., PubMed) to general (e.g., Google), based on the nature of user utterances. A critical aspect of federated search is resource selection - the selection of appropriate resources prior to issuing the query to ensure high-quality and rapid responses, and contain costs associated with calling the external search engines. However, current SOTA resource selection methodologies primarily rely on feature-based learning approaches. These methods often involve the labour intensive and expensive creation of training labels for each resource. In contrast, LLMs have exhibited strong effectiveness as zero-shot methods across NLP and IR tasks. We hypothesise that in the context of federated search LLMs can assess the relevance of resources without the need for extensive predefined labels or features. In this paper, we propose ReSLLM. Our ReSLLM method exploits LLMs to drive the selection of resources in federated search in a zero-shot setting. In addition, we devise an unsupervised fine tuning protocol, the Synthetic Label Augmentation Tuning (SLAT), where the relevance of previously logged queries and snippets from resources is predicted using an off-the-shelf LLM and then in turn used to fine-tune ReSLLM with respect to resource selection. Our empirical evaluation and analysis details the factors influencing the effectiveness of LLMs in this context. The results showcase the merits of ReSLLM for resource selection: not only competitive effectiveness in the zero-shot setting, but also obtaining large when fine-tuned using SLAT-protocol.
- [1754] arXiv:2401.17657 (cross-list from cs.LG) [ pdf , ps , other ]
-
Title: An attempt to generate new bridge types from latent space of energy-based modelAuthors: Hongjun ZhangComments: 8 pages, 6 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Use energy-based model for bridge-type innovation. The loss function is explained by the game theory, the logic is clear and the formula is simple and clear. Thus avoid the use of maximum likelihood estimation to explain the loss function and eliminate the need for Monte Carlo methods to solve the normalized denominator. Assuming that the bridge-type population follows a Boltzmann distribution, a neural network is constructed to represent the energy function. Use Langevin dynamics technology to generate a new sample with low energy value, thus a generative model of bridge-type based on energy is established. Train energy function on symmetric structured image dataset of three span beam bridge, arch bridge, cable-stayed bridge, and suspension bridge to accurately calculate the energy values of real and fake samples. Sampling from latent space, using gradient descent algorithm, the energy function transforms the sampling points into low energy score samples, thereby generating new bridge types different from the dataset. Due to unstable and slow training in this attempt, the possibility of generating new bridge types is rare and the image definition of generated images is low.
- [1755] arXiv:2401.17661 (cross-list from cs.SE) [ pdf , other ]
-
Title: Towards the implementation of Industry 4.0: A methodology-based approach oriented to the customer life cycleComments: Accepted version of paper: V\'ictor Julio Ram\'irez-Dur\'an, Idoia Berges, Arantza Illarramendi: Towards the implementation of Industry 4.0: A methodology-based approach oriented to the customer life cycle. Comput. Ind. 126: 103403 (2021). DOI: 10.1016/j.compind.2021.103403Journal-ref: Comput. Ind. 126: 103403 (2021)Subjects: Software Engineering (cs.SE) ; Artificial Intelligence (cs.AI)
Abstract: Many different worldwide initiatives are promoting the transformation from machine dominant manufacturing to digital manufacturing. Thus, to achieve a successful transformation to Industry 4.0 standard, manufacturing enterprises are required to implement a clear roadmap. However, Small and Medium Manufacturing Enterprises (SMEs) encounter many barriers and difficulties (economical, technical, cultural, etc.) in the implementation of Industry 4.0. Although several works deal with the incorporation of Industry 4.0 technologies in the area of the product and supply chain life cycles, which SMEs could use as reference, this is not the case for the customer life cycle. Thus, we present two contributions that can help the software engineers of those SMEs to incorporate Industry 4.0 technologies in the context of the customer life cycle. The first contribution is a methodology that can help those software engineers in the task of creating new software services, aligned with Industry 4.0, that allow to change how customers interact with enterprises and the experiences they have while interacting with them. The methodology details a set of stages that are divided into phases which in turn are made up of activities. It places special emphasis on the incorporation of semantics descriptions and 3D visualization in the implementation of those new services. The second contribution is a system developed for a real manufacturing scenario, using the proposed methodology, which allows to observe the possibilities that this kind of systems can offer to SMEs in two phases of the customer life cycle: Discover & Shop, and Use & Service.
- [1756] arXiv:2401.17671 (cross-list from cs.CL) [ pdf , other ]
-
Title: Contextual Feature Extraction Hierarchies Converge in Large Language Models and the BrainComments: 19 pages, 5 figures and 4 supplementary figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Neurons and Cognition (q-bio.NC)
Abstract: Recent advancements in artificial intelligence have sparked interest in the parallels between large language models (LLMs) and human neural processing, particularly in language comprehension. While prior research has established similarities in the representation of LLMs and the brain, the underlying computational principles that cause this convergence, especially in the context of evolving LLMs, remain elusive. Here, we examined a diverse selection of high-performance LLMs with similar parameter sizes to investigate the factors contributing to their alignment with the brain's language processing mechanisms. We find that as LLMs achieve higher performance on benchmark tasks, they not only become more brain-like as measured by higher performance when predicting neural responses from LLM embeddings, but also their hierarchical feature extraction pathways map more closely onto the brain's while using fewer layers to do the same encoding. We also compare the feature extraction pathways of the LLMs to each other and identify new ways in which high-performing models have converged toward similar hierarchical processing mechanisms. Finally, we show the importance of contextual information in improving model performance and brain similarity. Our findings reveal the converging aspects of language processing in the brain and LLMs and offer new directions for developing models that align more closely with human cognitive processing.
- [1757] arXiv:2401.17700 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Classification of executive functioning performance post-longitudinal tDCS using functional connectivity and machine learning methodsAuthors: Akash K Rao , Vishnu K Menon , Shashank Uttrani , Ayushman Dixit , Dipanshu Verma , Varun DuttComments: 7 pages, presented at the IEEE 20th India Council International Conference (INDICON 2023), Hyderabad, India, December 2023Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Executive functioning is a cognitive process that enables humans to plan, organize, and regulate their behavior in a goal-directed manner. Understanding and classifying the changes in executive functioning after longitudinal interventions (like transcranial direct current stimulation (tDCS)) has not been explored in the literature. This study employs functional connectivity and machine learning algorithms to classify executive functioning performance post-tDCS. Fifty subjects were divided into experimental and placebo control groups. EEG data was collected while subjects performed an executive functioning task on Day 1. The experimental group received tDCS during task training from Day 2 to Day 8, while the control group received sham tDCS. On Day 10, subjects repeated the tasks specified on Day 1. Different functional connectivity metrics were extracted from EEG data and eventually used for classifying executive functioning performance using different machine learning algorithms. Results revealed that a novel combination of partial directed coherence and multi-layer perceptron (along with recursive feature elimination) resulted in a high classification accuracy of 95.44%. We discuss the implications of our results in developing real-time neurofeedback systems for assessing and enhancing executive functioning performance post-tDCS administration.
- [1758] arXiv:2401.17711 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Prediction of multitasking performance post-longitudinal tDCS via EEG-based functional connectivity and machine learning methodsAuthors: Akash K Rao , Shashank Uttrani , Vishnu K Menon , Darshil Shah , Arnav Bhavsar , Shubhajit Roy Chowdhury , Varun DuttComments: 16 pages, presented at the 30th International Conference on Neural Information Processing (ICONIP2023), Changsha, China, November 2023Subjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Predicting and understanding the changes in cognitive performance, especially after a longitudinal intervention, is a fundamental goal in neuroscience. Longitudinal brain stimulation-based interventions like transcranial direct current stimulation (tDCS) induce short-term changes in the resting membrane potential and influence cognitive processes. However, very little research has been conducted on predicting these changes in cognitive performance post-intervention. In this research, we intend to address this gap in the literature by employing different EEG-based functional connectivity analyses and machine learning algorithms to predict changes in cognitive performance in a complex multitasking task. Forty subjects were divided into experimental and active-control conditions. On Day 1, all subjects executed a multitasking task with simultaneous 32-channel EEG being acquired. From Day 2 to Day 7, subjects in the experimental condition undertook 15 minutes of 2mA anodal tDCS stimulation during task training. Subjects in the active-control condition undertook 15 minutes of sham stimulation during task training. On Day 10, all subjects again executed the multitasking task with EEG acquisition. Source-level functional connectivity metrics, namely phase lag index and directed transfer function, were extracted from the EEG data on Day 1 and Day 10. Various machine learning models were employed to predict changes in cognitive performance. Results revealed that the multi-layer perceptron and directed transfer function recorded a cross-validation training RMSE of 5.11% and a test RMSE of 4.97%. We discuss the implications of our results in developing real-time cognitive state assessors for accurately predicting cognitive performance in dynamic and complex tasks post-tDCS intervention
- [1759] arXiv:2401.17733 (cross-list from cs.NE) [ pdf , ps , other ]
-
Title: Towards Physical Plausibility in Neuroevolution SystemsSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The increasing usage of Artificial Intelligence (AI) models, especially Deep Neural Networks (DNNs), is increasing the power consumption during training and inference, posing environmental concerns and driving the need for more energy-efficient algorithms and hardware solutions. This work addresses the growing energy consumption problem in Machine Learning (ML), particularly during the inference phase. Even a slight reduction in power usage can lead to significant energy savings, benefiting users, companies, and the environment. Our approach focuses on maximizing the accuracy of Artificial Neural Network (ANN) models using a neuroevolutionary framework whilst minimizing their power consumption. To do so, power consumption is considered in the fitness function. We introduce a new mutation strategy that stochastically reintroduces modules of layers, with power-efficient modules having a higher chance of being chosen. We introduce a novel technique that allows training two separate models in a single training step whilst promoting one of them to be more power efficient than the other while maintaining similar accuracy. The results demonstrate a reduction in power consumption of ANN models by up to 29.2% without a significant decrease in predictive performance.
- [1760] arXiv:2401.17739 (cross-list from math.NA) [ pdf , other ]
-
Title: Operator learning without the adjointComments: 49 pages, 5 figuresSubjects: Numerical Analysis (math.NA) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: There is a mystery at the heart of operator learning: how can one recover a non-self-adjoint operator from data without probing the adjoint? Current practical approaches suggest that one can accurately recover an operator while only using data generated by the forward action of the operator without access to the adjoint. However, naively, it seems essential to sample the action of the adjoint. In this paper, we partially explain this mystery by proving that without querying the adjoint, one can approximate a family of non-self-adjoint infinite-dimensional compact operators via projection onto a Fourier basis. We then apply the result to recovering Green's functions of elliptic partial differential operators and derive an adjoint-free sample complexity bound. While existing theory justifies low sample complexity in operator learning, ours is the first adjoint-free analysis that attempts to close the gap between theory and practice.
- [1761] arXiv:2401.17741 (cross-list from cs.RO) [ pdf , other ]
-
Title: Haris: an Advanced Autonomous Mobile Robot for Smart Parking AssistanceComments: Accepted in 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 2024Subjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: This paper presents Haris, an advanced autonomous mobile robot system for tracking the location of vehicles in crowded car parks using license plate recognition. The system employs simultaneous localization and mapping (SLAM) for autonomous navigation and precise mapping of the parking area, eliminating the need for GPS dependency. In addition, the system utilizes a sophisticated framework using computer vision techniques for object detection and automatic license plate recognition (ALPR) for reading and associating license plate numbers with location data. This information is subsequently synchronized with a back-end service and made accessible to users via a user-friendly mobile app, offering effortless vehicle location and alleviating congestion within the parking facility. The proposed system has the potential to improve the management of short-term large outdoor parking areas in crowded places such as sports stadiums. The demo of the robot can be found on this https URL .
- [1762] arXiv:2401.17752 (cross-list from cs.LG) [ pdf , other ]
-
Title: PF-GNN: Differentiable particle filtering based approximation of universal graph representationsComments: Published as a conference paper at ICLR 2022Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Message passing Graph Neural Networks (GNNs) are known to be limited in expressive power by the 1-WL color-refinement test for graph isomorphism. Other more expressive models either are computationally expensive or need preprocessing to extract structural features from the graph. In this work, we propose to make GNNs universal by guiding the learning process with exact isomorphism solver techniques which operate on the paradigm of Individualization and Refinement (IR), a method to artificially introduce asymmetry and further refine the coloring when 1-WL stops. Isomorphism solvers generate a search tree of colorings whose leaves uniquely identify the graph. However, the tree grows exponentially large and needs hand-crafted pruning techniques which are not desirable from a learning perspective. We take a probabilistic view and approximate the search tree of colorings (i.e. embeddings) by sampling multiple paths from root to leaves of the search tree. To learn more discriminative representations, we guide the sampling process with particle filter updates, a principled approach for sequential state estimation. Our algorithm is end-to-end differentiable, can be applied with any GNN as backbone and learns richer graph representations with only linear increase in runtime. Experimental evaluation shows that our approach consistently outperforms leading GNN models on both synthetic benchmarks for isomorphism detection as well as real-world datasets.
- [1763] arXiv:2401.17776 (cross-list from cs.CV) [ pdf , other ]
-
Title: Double InfoGAN for Contrastive AnalysisComments: Accepted at AISTATS 2024Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: Contrastive Analysis (CA) deals with the discovery of what is common and what is distinctive of a target domain compared to a background one. This is of great interest in many applications, such as medical imaging. Current state-of-the-art (SOTA) methods are latent variable models based on VAE (CA-VAEs). However, they all either ignore important constraints or they don't enforce fundamental assumptions. This may lead to sub-optimal solutions where distinctive factors are mistaken for common ones (or viceversa). Furthermore, the generated images have a rather poor quality, typical of VAEs, decreasing their interpretability and usefulness. Here, we propose Double InfoGAN, the first GAN based method for CA that leverages the high-quality synthesis of GAN and the separation power of InfoGAN. Experimental results on four visual datasets, from simple synthetic examples to complex medical images, show that the proposed method outperforms SOTA CA-VAEs in terms of latent separation and image quality. Datasets and code are available online.
- [1764] arXiv:2401.17791 (cross-list from cs.LG) [ pdf , other ]
-
Title: Graph Transformers without Positional EncodingsAuthors: Ayush GargComments: Independent ResearchSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Recently, Transformers for graph representation learning have become increasingly popular, achieving state-of-the-art performance on a wide-variety of datasets, either alone or in combination with message-passing graph neural networks (MP-GNNs). Infusing graph inductive-biases in the innately structure-agnostic transformer architecture in the form of structural or positional encodings (PEs) is key to achieving these impressive results. However, designing such encodings is tricky and disparate attempts have been made to engineer such encodings including Laplacian eigenvectors, relative random-walk probabilities (RRWP), spatial encodings, centrality encodings, edge encodings etc. In this work, we argue that such encodings may not be required at all, provided the attention mechanism itself incorporates information about the graph structure. We introduce Eigenformer, a Graph Transformer employing a novel spectrum-aware attention mechanism cognizant of the Laplacian spectrum of the graph, and empirically show that it achieves performance comparable to SOTA Graph Transformers on a number of standard GNN benchmark datasets, even surpassing the SOTA on some datasets. The simpler attention mechanism also allows us to train wider and deeper models for a given parameter budget.
- [1765] arXiv:2401.17802 (cross-list from cs.LG) [ pdf , other ]
-
Title: Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive LearningSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Contrastive representation learning is crucial in time series analysis as it alleviates the issue of data noise and incompleteness as well as sparsity of supervision signal. However, existing constrastive learning frameworks usually focus on intral-temporal features, which fails to fully exploit the intricate nature of time series data. To address this issue, we propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting. Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp to obtain optimized sub-sequences. Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series to learn the underlying structure feature on the unlabeled time series. Meanwhile, we design a supervised task to learn more robust representations and facilitate the contrastive learning process. Finally, we jointly optimize the above two tasks. By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task. Extensive experiments, in comparison with state-of-the-arts, well demonstrate the effectiveness of DE-TSMCL, where the maximum improvement can reach to 27.3%.
- [1766] arXiv:2401.17805 (cross-list from cs.CY) [ pdf , other ]
-
Title: Biospheric AIAuthors: Marcin KoreckiSubjects: Computers and Society (cs.CY) ; Artificial Intelligence (cs.AI)
Abstract: The dominant paradigm in AI ethics and value alignment is highly anthropocentric. The focus of these disciplines is strictly on human values which limits the depth and breadth of their insights. Recently, attempts to expand to a sentientist perspective have been initiated. We argue that neither of these outlooks is sufficient to capture the actual complexity of the biosphere and ensure that AI does not damage it. Thus, we propose a new paradigm -- Biospheric AI that assumes an ecocentric perspective. We discuss hypothetical ways in which such an AI might be designed. Moreover, we give directions for research and application of the modern AI models that would be consistent with the biospheric interests. All in all, this work attempts to take first steps towards a comprehensive program of research that focuses on the interactions between AI and the biosphere.
- [1767] arXiv:2401.17809 (cross-list from cs.CL) [ pdf , other ]
-
Title: SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding AlteringAuthors: Xiaopeng Li , Shasha Li , Shezheng Song , Huijun Liu , Bin Ji , Xi Wang , Jun Ma , Jie Yu , Xiaodong Liu , Jing Wang , Weimin ZhangComments: Under review; Our code is available at this https URLSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique for efficiently updating a small amount of knowledge of LLMs and has attracted much attention. In particular, local editing methods, which directly update model parameters, are more suitable for updating a small amount of knowledge. Local editing methods update weights by computing least squares closed-form solutions and identify edited knowledge by vector-level matching in inference, which achieve promising results. However, these methods still require a lot of time and resources to complete the computation. Moreover, vector-level matching lacks reliability, and such updates disrupt the original organization of the model's parameters. To address these issues, we propose an detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching and adds them to the subject word embeddings in Transformer input. To get these editing embeddings, we propose optimizing then suppressing fusion method, which first optimizes learnable embedding vectors for the editing target and then suppresses the Knowledge Embedding Dimensions (KEDs) to obtain final editing embeddings. We thus propose SWEA$\oplus$OS method for editing factual knowledge in LLMs. We demonstrate the overall state-of-the-art (SOTA) performance of SWEA$\oplus$OS on the \textsc{CounterFact} and zsRE datasets. To further validate the reasoning ability of SWEA$\oplus$OS in editing knowledge, we evaluate it on the more complex \textsc{RippleEdits} benchmark. The results demonstrate that SWEA$\oplus$OS possesses SOTA reasoning ability.
- [1768] arXiv:2401.17812 (cross-list from cs.NI) [ pdf , other ]
-
Title: Deterministic Computing Power Networking: Architecture, Technologies and ProspectsAuthors: Qingmin Jia , Yujiao Hu , Xiaomao Zhou , Qianpiao Ma , Kai Guo , Huayu Zhang , Renchao Xie , Tao Huang , Yunjie LiuSubjects: Networking and Internet Architecture (cs.NI) ; Artificial Intelligence (cs.AI)
Abstract: With the development of new Internet services such as computation-intensive and delay-sensitive tasks, the traditional "Best Effort" network transmission mode has been greatly challenged. The network system is urgently required to provide end-to-end transmission determinacy and computing determinacy for new applications to ensure the safe and efficient operation of services. Based on the research of the convergence of computing and networking, a new network paradigm named deterministic computing power networking (Det-CPN) is proposed. In this article, we firstly introduce the research advance of computing power networking. And then the motivations and scenarios of Det-CPN are analyzed. Following that, we present the system architecture, technological capabilities, workflow as well as key technologies for Det-CPN. Finally, the challenges and future trends of Det-CPN are analyzed and discussed.
- [1769] arXiv:2401.17827 (cross-list from cs.CL) [ pdf , other ]
-
Title: Neural Machine Translation for Malayalam Paraphrase GenerationSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: This study explores four methods of generating paraphrases in Malayalam, utilizing resources available for English paraphrasing and pre-trained Neural Machine Translation (NMT) models. We evaluate the resulting paraphrases using both automated metrics, such as BLEU, METEOR, and cosine similarity, as well as human annotation. Our findings suggest that automated evaluation measures may not be fully appropriate for Malayalam, as they do not consistently align with human judgment. This discrepancy underscores the need for more nuanced paraphrase evaluation approaches especially for highly agglutinative languages.
- [1770] arXiv:2401.17828 (cross-list from cs.CV) [ pdf , other ]
-
Title: Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic SegmentationComments: 7 pages, 4 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: In recent years, weakly supervised semantic segmentation using image-level labels as supervision has received significant attention in the field of computer vision. Most existing methods have addressed the challenges arising from the lack of spatial information in these labels by focusing on facilitating supervised learning through the generation of pseudo-labels from class activation maps (CAMs). Due to the localized pattern detection of CNNs, CAMs often emphasize only the most discriminative parts of an object, making it challenging to accurately distinguish foreground objects from each other and the background. Recent studies have shown that Vision Transformer (ViT) features, due to their global view, are more effective in capturing the scene layout than CNNs. However, the use of hierarchical ViTs has not been extensively explored in this field. This work explores the use of Swin Transformer by proposing "SWTformer" to enhance the accuracy of the initial seed CAMs by bringing local and global views together. SWTformer-V1 generates class probabilities and CAMs using only the patch tokens as features. SWTformer-V2 incorporates a multi-scale feature fusion mechanism to extract additional information and utilizes a background-aware mechanism to generate more accurate localization maps with improved cross-object discrimination. Based on experiments on the PascalVOC 2012 dataset, SWTformer-V1 achieves a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models. It also yields comparable performance by 0.82% mIoU on average higher than other methods in generating initial localization maps, depending only on the classification network. SWTformer-V2 further improves the accuracy of the generated seed CAMs by 5.32% mIoU, further proving the effectiveness of the local-to-global view provided by the Swin transformer. Code available at: this https URL
- [1771] arXiv:2401.17838 (cross-list from cs.LG) [ pdf , other ]
-
Title: A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint PredictionComments: 11 pages, 7 figures, AAAI24Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either rely on domain-expert knowledge or regarding skill evolution as a simplified time series forecasting problem. However, both approaches overlook the sophisticated relationships among different skills and the inner-connection between skill demand and supply variations. In this paper, we propose a Cross-view Hierarchical Graph learning Hypernetwork (CHGH) framework for joint skill demand-supply prediction. Specifically, CHGH is an encoder-decoder network consisting of i) a cross-view graph encoder to capture the interconnection between skill demand and supply, ii) a hierarchical graph encoder to model the co-evolution of skills from a cluster-wise perspective, and iii) a conditional hyper-decoder to jointly predict demand and supply variations by incorporating historical demand-supply gaps. Extensive experiments on three real-world datasets demonstrate the superiority of the proposed framework compared to seven baselines and the effectiveness of the three modules.
- [1772] arXiv:2401.17839 (cross-list from cs.CL) [ pdf , other ]
-
Title: Global-Liar: Factuality of LLMs over Time and Geographic RegionsComments: 24 pages, 12 figures, 9 tablesSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Abstract: The increasing reliance on AI-driven solutions, particularly Large Language Models (LLMs) like the GPT series, for information retrieval highlights the critical need for their factuality and fairness, especially amidst the rampant spread of misinformation and disinformation online. Our study evaluates the factual accuracy, stability, and biases in widely adopted GPT models, including GPT-3.5 and GPT-4, contributing to reliability and integrity of AI-mediated information dissemination.
We introduce 'Global-Liar,' a dataset uniquely balanced in terms of geographic and temporal representation, facilitating a more nuanced evaluation of LLM biases. Our analysis reveals that newer iterations of GPT models do not always equate to improved performance. Notably, the GPT-4 version from March demonstrates higher factual accuracy than its subsequent June release. Furthermore, a concerning bias is observed, privileging statements from the Global North over the Global South, thus potentially exacerbating existing informational inequities. Regions such as Africa and the Middle East are at a disadvantage, with much lower factual accuracy. The performance fluctuations over time suggest that model updates may not consistently benefit all regions equally.
Our study also offers insights into the impact of various LLM configuration settings, such as binary decision forcing, model re-runs and temperature, on model's factuality. Models constrained to binary (true/false) choices exhibit reduced factuality compared to those allowing an 'unclear' option. Single inference at a low temperature setting matches the reliability of majority voting across various configurations. The insights gained highlight the need for culturally diverse and geographically inclusive model training and evaluation. This approach is key to achieving global equity in technology, distributing AI benefits fairly worldwide. - [1773] arXiv:2401.17842 (cross-list from cs.NE) [ pdf , other ]
-
Title: Explainable Benchmarking for Iterative Optimization HeuristicsComments: Submitted to ACM TELOSubjects: Neural and Evolutionary Computing (cs.NE) ; Artificial Intelligence (cs.AI)
Abstract: Benchmarking heuristic algorithms is vital to understand under which conditions and on what kind of problems certain algorithms perform well. In most current research into heuristic optimization algorithms, only a very limited number of scenarios, algorithm configurations and hyper-parameter settings are explored, leading to incomplete and often biased insights and results. This paper presents a novel approach we call explainable benchmarking. Introducing the IOH-Xplainer software framework, for analyzing and understanding the performance of various optimization algorithms and the impact of their different components and hyper-parameters. We showcase the framework in the context of two modular optimization frameworks. Through this framework, we examine the impact of different algorithmic components and configurations, offering insights into their performance across diverse scenarios. We provide a systematic method for evaluating and interpreting the behaviour and efficiency of iterative optimization heuristics in a more transparent and comprehensible manner, allowing for better benchmarking and algorithm design.
- [1774] arXiv:2401.17865 (cross-list from cs.LG) [ pdf , other ]
-
Title: Manipulating Predictions over Discrete Inputs in Machine TeachingComments: 8 pages, 2 figuresSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on manipulating student models' predictions based on the goals of teachers via changing the training data efficiently. We formulate this task as a combinatorial optimization problem and solve it by proposing an iterative searching algorithm. Our algorithm demonstrates significant numerical merit in the scenarios where a teacher attempts at correcting erroneous predictions to improve the student's models, or maliciously manipulating the model to misclassify some specific samples to the target class aligned with his personal profits. Experimental results show that our proposed algorithm can have superior performance in effectively and efficiently manipulating the predictions of the model, surpassing conventional baselines.
- [1775] arXiv:2401.17866 (cross-list from cs.HC) [ pdf , ps , other ]
-
Title: Making Sense of Knowledge Intensive Processes: an Oil & Gas Industry ScenarioComments: 9 pages. This paper was presented at the Sensemaking in a Senseless World workshop during the 2018 ACM CHI Conference on Human Factors in Computing SystemsSubjects: Human-Computer Interaction (cs.HC) ; Artificial Intelligence (cs.AI)
Abstract: Sensemaking is a constant and ongoing process by which people associate meaning to experiences. It can be an individual process, known as abduction, or a group process by which people give meaning to collective experiences. The sensemaking of a group is influenced by the abduction process of each person about the experience. Every collaborative process needs some level of sensemaking to show results. For a knowledge intensive process, sensemaking is central and related to most of its tasks. We present findings from a fieldwork executed in knowledge intensive process from the Oil and Gas industry. Our findings indicated that different types of knowledge can be combined to compose the result of a sensemaking process (e.g. decision, the need for more discussion, etc.). This paper presents an initial set of knowledge types that can be combined to compose the result of the sensemaking of a collaborative decision making process. We also discuss ideas for using systems powered by Artificial Intelligence to support sensemaking processes.
- [1776] arXiv:2401.17870 (cross-list from cs.LG) [ pdf , other ]
-
Title: Efficient Subseasonal Weather Forecast using Teleconnection-informed TransformersComments: Submitted to IGARSS 2024Subjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Subseasonal forecasting, which is pivotal for agriculture, water resource management, and early warning of disasters, faces challenges due to the chaotic nature of the atmosphere. Recent advances in machine learning (ML) have revolutionized weather forecasting by achieving competitive predictive skills to numerical models. However, training such foundation models requires thousands of GPU days, which causes substantial carbon emissions and limits their broader applicability. Moreover, ML models tend to fool the pixel-wise error scores by producing smoothed results which lack physical consistency and meteorological meaning. To deal with the aforementioned problems, we propose a teleconnection-informed transformer. Our architecture leverages the pretrained Pangu model to achieve good initial weights and integrates a teleconnection-informed temporal module to improve predictability in an extended temporal range. Remarkably, by adjusting 1.1% of the Pangu model's parameters, our method enhances predictability on four surface and five upper-level atmospheric variables at a two-week lead time. Furthermore, the teleconnection-filtered features improve the spatial granularity of outputs significantly, indicating their potential physical consistency. Our research underscores the importance of atmospheric and oceanic teleconnections in driving future weather conditions. Besides, it presents a resource-efficient pathway for researchers to leverage existing foundation models on versatile downstream tasks.
- [1777] arXiv:2401.17895 (cross-list from cs.CV) [ pdf , other ]
-
Title: ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance FieldsAuthors: Edward Bartrum , Thu Nguyen-Phuoc , Chris Xie , Zhengqin Li , Numair Khan , Armen Avetisyan , Douglas Lanman , Lei XiaoComments: For our project page, see this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Graphics (cs.GR)
Abstract: We introduce ReplaceAnything3D model (RAM3D), a novel text-guided 3D scene editing method that enables the replacement of specific objects within a scene. Given multi-view images of a scene, a text prompt describing the object to replace, and a text prompt describing the new object, our Erase-and-Replace approach can effectively swap objects in the scene with newly generated content while maintaining 3D consistency across multiple viewpoints. We demonstrate the versatility of ReplaceAnything3D by applying it to various realistic 3D scenes, showcasing results of modified foreground objects that are well-integrated with the rest of the scene without affecting its overall integrity.
- [1778] arXiv:2401.17914 (cross-list from cs.RO) [ pdf , other ]
-
Title: Attention Graph for Multi-Robot Social Navigation with Deep Reinforcement LearningSubjects: Robotics (cs.RO) ; Artificial Intelligence (cs.AI)
Abstract: Learning robot navigation strategies among pedestrian is crucial for domain based applications. Combining perception, planning and prediction allows us to model the interactions between robots and pedestrians, resulting in impressive outcomes especially with recent approaches based on deep reinforcement learning (RL). However, these works do not consider multi-robot scenarios. In this paper, we present MultiSoc, a new method for learning multi-agent socially aware navigation strategies using RL. Inspired by recent works on multi-agent deep RL, our method leverages graph-based representation of agent interactions, combining the positions and fields of view of entities (pedestrians and agents). Each agent uses a model based on two Graph Neural Network combined with attention mechanisms. First an edge-selector produces a sparse graph, then a crowd coordinator applies node attention to produce a graph representing the influence of each entity on the others. This is incorporated into a model-free RL framework to learn multi-agent policies. We evaluate our approach on simulation and provide a series of experiments in a set of various conditions (number of agents / pedestrians). Empirical results show that our method learns faster than social navigation deep RL mono-agent techniques, and enables efficient multi-agent implicit coordination in challenging crowd navigation with multiple heterogeneous humans. Furthermore, by incorporating customizable meta-parameters, we can adjust the neighborhood density to take into account in our navigation strategy.
- [1779] arXiv:2401.17972 (cross-list from cs.CV) [ pdf , other ]
-
Title: MelNet: A Real-Time Deep Learning Algorithm for Object DetectionComments: 11 pages, 9 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: In this study, a novel deep learning algorithm for object detection, named MelNet, was introduced. MelNet underwent training utilizing the KITTI dataset for object detection. Following 300 training epochs, MelNet attained an mAP (mean average precision) score of 0.732. Additionally, three alternative models -YOLOv5, EfficientDet, and Faster-RCNN-MobileNetv3- were trained on the KITTI dataset and juxtaposed with MelNet for object detection.
The outcomes underscore the efficacy of employing transfer learning in certain instances. Notably, preexisting models trained on prominent datasets (e.g., ImageNet, COCO, and Pascal VOC) yield superior results. Another finding underscores the viability of creating a new model tailored to a specific scenario and training it on a specific dataset. This investigation demonstrates that training MelNet exclusively on the KITTI dataset also surpasses EfficientDet after 150 epochs. Consequently, post-training, MelNet's performance closely aligns with that of other pre-trained models. - [1780] arXiv:2401.17975 (cross-list from cs.LG) [ pdf , other ]
-
Title: Understanding polysemanticity in neural networks through coding theorySubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI)
Abstract: Despite substantial efforts, neural network interpretability remains an elusive goal, with previous research failing to provide succinct explanations of most single neurons' impact on the network output. This limitation is due to the polysemantic nature of most neurons, whereby a given neuron is involved in multiple unrelated network states, complicating the interpretation of that neuron. In this paper, we apply tools developed in neuroscience and information theory to propose both a novel practical approach to network interpretability and theoretical insights into polysemanticity and the density of codes. We infer levels of redundancy in the network's code by inspecting the eigenspectrum of the activation's covariance matrix. Furthermore, we show how random projections can reveal whether a network exhibits a smooth or non-differentiable code and hence how interpretable the code is. This same framework explains the advantages of polysemantic neurons to learning performance and explains trends found in recent results by Elhage et al.~(2022). Our approach advances the pursuit of interpretability in neural networks, providing insights into their underlying structure and suggesting new avenues for circuit-level interpretability.
- [1781] arXiv:2401.17981 (cross-list from cs.CV) [ pdf , other ]
-
Title: Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical StudySubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Despite the impressive capabilities of Multimodal Large Language Models (MLLMs) in integrating text and image modalities, challenges remain in accurately interpreting detailed visual elements. This paper presents an empirical study on enhancing MLLMs with state-of-the-art (SOTA) object detection and Optical Character Recognition models to improve fine-grained image understanding and reduce hallucination in responses. Our research investigates the embedding-based infusion of detection information, the impact of such infusion on the MLLMs' original abilities, and the interchangeability of detection models. We conduct systematic experiments with models such as LLaVA-1.5, DINO, and PaddleOCRv2, revealing that our approach not only refines MLLMs' performance in specific visual tasks but also maintains their original strengths. The resulting enhanced MLLMs outperform SOTA models on 9 out of 10 benchmarks, achieving an improvement of up to 12.99% on the normalized average score, marking a notable advancement in multimodal understanding. We release our codes to facilitate further exploration into the fine-grained multimodal dialogue capabilities of MLLMs.
- [1782] arXiv:2401.17985 (cross-list from cs.CV) [ pdf , other ]
-
Title: Shrub of a thousand faces: an individual segmentation from satellite images using deep learningAuthors: Rohaifa Khaldi , Siham Tabik , Sergio Puertas-Ruiz , Julio Peñas de Giles , José Antonio Hódar Correa , Regino Zamora , Domingo Alcaraz SeguraComments: 39 pages, 20 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI)
Abstract: Monitoring the distribution and size structure of long-living shrubs, such as Juniperus communis, can be used to estimate the long-term effects of climate change on high-mountain and high latitude ecosystems. Historical aerial very-high resolution imagery offers a retrospective tool to monitor shrub growth and distribution at high precision. Currently, deep learning models provide impressive results for detecting and delineating the contour of objects with defined shapes. However, adapting these models to detect natural objects that express complex growth patterns, such as junipers, is still a challenging task.
This research presents a novel approach that leverages remotely sensed RGB imagery in conjunction with Mask R-CNN-based instance segmentation models to individually delineate Juniperus shrubs above the treeline in Sierra Nevada (Spain). In this study, we propose a new data construction design that consists in using photo interpreted (PI) and field work (FW) data to respectively develop and externally validate the model. We also propose a new shrub-tailored evaluation algorithm based on a new metric called Multiple Intersections over Ground Truth Area (MIoGTA) to assess and optimize the model shrub delineation performance. Finally, we deploy the developed model for the first time to generate a wall-to-wall map of Juniperus individuals.
The experimental results demonstrate the efficiency of our dual data construction approach in overcoming the limitations associated with traditional field survey methods. They also highlight the robustness of MIoGTA metric in evaluating instance segmentation models on species with complex growth patterns showing more resilience against data annotation uncertainty. Furthermore, they show the effectiveness of employing Mask R-CNN with ResNet101-C4 backbone in delineating PI and FW shrubs, achieving an F1-score of 87,87% and 76.86%, respectively. - [1783] arXiv:2401.18018 (cross-list from cs.LG) [ pdf , other ]
-
Title: On Prompt-Driven Safeguarding for Large Language ModelsAuthors: Chujie Zheng , Fan Yin , Hao Zhou , Fandong Meng , Jie Zhou , Kai-Wei Chang , Minlie Huang , Nanyun PengSubjects: Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) from complying with queries that contain harmful intents. However, the working mechanisms of safety prompts have not been revealed yet, which hinders the potential for automatically optimizing them to improve LLM safety. To this end, we investigate the impact of safety prompts from the perspective of model representations. We find that in models' representation space, harmful and harmless queries can be largely distinguished, but this is not noticeably enhanced by safety prompts. Instead, the queries' representations are moved by safety prompts in similar directions where models become more prone to refusal (i.e., refusing to provide assistance) even when the queries are harmless. Inspired by these findings, we propose a method called DRO (Directed Representation Optimization) for automatic safety prompt optimization. It treats safety prompts as continuous, trainable embeddings and learns to move the representations of harmful/harmless queries along/opposite the direction in which the model's refusal probability increases. Experiments with eight LLMs on out-of-domain benchmarks demonstrate that DRO remarkably improves the safeguarding performance of human-crafted safety prompts and outperforms strong baselines, without compromising the general model capability.
- [1784] arXiv:2401.18028 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Supporting Anticipatory Governance using LLMs: Evaluating and Aligning Large Language Models with the News Media to Anticipate the Negative Impacts of AIComments: 14 pages + research ethics and social impact statement, references, and appendix. Under conference reviewSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Abstract: Anticipating the negative impacts of emerging AI technologies is a challenge, especially in the early stages of development. An understudied approach to such anticipation is the use of LLMs to enhance and guide this process. Despite advancements in LLMs and evaluation metrics to account for biases in generated text, it is unclear how well these models perform in anticipatory tasks. Specifically, the use of LLMs to anticipate AI impacts raises questions about the quality and range of categories of negative impacts these models are capable of generating. In this paper we leverage news media, a diverse data source that is rich with normative assessments of emerging technologies, to formulate a taxonomy of impacts to act as a baseline for comparing against. By computationally analyzing thousands of news articles published by hundreds of online news domains around the world, we develop a taxonomy consisting of ten categories of AI impacts. We then evaluate both instruction-based (GPT-4 and Mistral-7B-Instruct) and fine-tuned completion models (Mistral-7B and GPT-3) using a sample from this baseline. We find that the generated impacts using Mistral-7B, fine-tuned on impacts from the news media, tend to be qualitatively on par with impacts generated using a larger scale model such as GPT-4. Moreover, we find that these LLMs generate impacts that largely reflect the taxonomy of negative impacts identified in the news media, however the impacts produced by instruction-based models had gaps in the production of certain categories of impacts in comparison to fine-tuned models. This research highlights a potential bias in state-of-the-art LLMs when used for anticipating impacts and demonstrates the advantages of aligning smaller LLMs with a diverse range of impacts, such as those reflected in the news media, to better reflect such impacts during anticipatory exercises.
- [1785] arXiv:2401.18034 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Paramanu: A Family of Novel Efficient Indic Generative Foundation Language ModelsSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: We present Gyan AI Paramanu ("atom"), a family of novel language models for Indian languages. It is a collection of auto-regressive monolingual, bilingual, and multilingual Indic language models pretrained from scratch on a single GPU for 10 Indian languages (Assamese, Bangla, Hindi, Konkani, Maithili, Marathi, Odia, Sanskrit, Tamil, Telugu) across 5 scripts (Bangla, Devanagari, Odia, Tamil, Telugu) of varying sizes ranging from 13.29M to 367.5M.The models are pretrained with a context size of 1024 on a single GPU. The models are very efficient, small, fast, and powerful. We have also developed an efficient most advanced Indic tokenizer that can even tokenize unseen languages. In order to avoid the "curse of multi-linguality" in our multilingual mParamanu model, we pretrained on comparable corpora by typological grouping using the same script. We performed human evaluation of our pretrained models for open end text generation on grammar, coherence, creativity, and factuality metrics for Bangla, Hindi, and Sanskrit. Our Bangla, Hindi, and Sanskrit models outperformed GPT-3.5-Turbo (ChatGPT), Bloom 7B, LLaMa-2 7B, OPT 6.7B, GPT-J 6B, GPTNeo 1.3B, GPT2-XL large language models (LLMs) by a large margin despite being smaller in size by 66 to 20 times compared to standard 7B LLMs. To run inference on our pretrained models, CPU is enough, and GPU is not needed. We also instruction-tuned our pretrained Bangla, Hindi, Marathi, Tamil, and Telugu models on 23k instructions in respective languages. Our pretrained and instruction-tuned models which are first of its kind, most powerful efficient small generative language models ever developed for Indic languages, and the various results lead to the conclusion that high quality generative language models are possible without high amount of compute power and humongous number of parameters. We plan to release our models at this https URL .
- [1786] arXiv:2401.18040 (cross-list from cs.CL) [ pdf , ps , other ]
-
Title: Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and AdaptabilityAuthors: Navin Kamuni , Hardik Shah , Sathishkumar Chintala , Naveen Kunchakuri , Sujatha Alla Old DominionComments: 6 pages, 1 figure, 18th IEEE International Conference on Semantic ComputingSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI)
Abstract: End-to-end multi-task dialogue systems are usually designed with separate modules for the dialogue pipeline. Among these, the policy module is essential for deciding what to do in response to user input. This policy is trained by reinforcement learning algorithms by taking advantage of an environment in which an agent receives feedback in the form of a reward signal. The current dialogue systems, however, only provide meagre and simplistic rewards. Investigating intrinsic motivation reinforcement learning algorithms is the goal of this study. Through this, the agent can quickly accelerate training and improve its capacity to judge the quality of its actions by teaching it an internal incentive system. In particular, we adapt techniques for random network distillation and curiosity-driven reinforcement learning to measure the frequency of state visits and encourage exploration by using semantic similarity between utterances. Experimental results on MultiWOZ, a heterogeneous dataset, show that intrinsic motivation-based debate systems outperform policies that depend on extrinsic incentives. By adopting random network distillation, for example, which is trained using semantic similarity between user-system dialogues, an astounding average success rate of 73% is achieved. This is a significant improvement over the baseline Proximal Policy Optimization (PPO), which has an average success rate of 60%. In addition, performance indicators such as booking rates and completion rates show a 10% rise over the baseline. Furthermore, these intrinsic incentive models help improve the system's policy's resilience in an increasing amount of domains. This implies that they could be useful in scaling up to settings that cover a wider range of domains.
- [1787] arXiv:2401.18045 (cross-list from cs.CL) [ pdf , other ]
-
Title: SpeechComposer: Unifying Multiple Speech Tasks with Prompt CompositionAuthors: Yihan Wu , Soumi Maiti , Yifan Peng , Wangyou Zhang , Chenda Li , Yuyue Wang , Xihua Wang , Shinji Watanabe , Ruihua SongComments: 11 pages, 2 figuresSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Abstract: Recent advancements in language models have significantly enhanced performance in multiple speech-related tasks. Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model. However, this design omits the intrinsic connections between different speech tasks, which can potentially boost the performance of each task. In this work, we propose a novel decoder-only speech language model, SpeechComposer, that can unify common speech tasks by composing a fixed set of prompt tokens. Built upon four primary tasks -- speech synthesis, speech recognition, speech language modeling, and text language modeling -- SpeechComposer can easily extend to more speech tasks via compositions of well-designed prompt tokens, like voice conversion and speech enhancement. The unification of prompt tokens also makes it possible for knowledge sharing among different speech tasks in a more structured manner. Experimental results demonstrate that our proposed SpeechComposer can improve the performance of both primary tasks and composite tasks, showing the effectiveness of the shared prompt tokens. Remarkably, the unified decoder-only model achieves a comparable and even better performance than the baselines which are expert models designed for single tasks.
- [1788] arXiv:2401.18070 (cross-list from cs.CL) [ pdf , other ]
-
Title: Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?Authors: Andreas Opedal , Alessandro Stolfo , Haruki Shirakami , Ying Jiao , Ryan Cotterell , Bernhard Schölkopf , Abulhair Saparov , Mrinmaya SachanComments: PreprintSubjects: Computation and Language (cs.CL) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which cognitive properties are well-modeled by LLMs, and which are not. In this work, we study the biases of LLMs in relation to those known in children when solving arithmetic word problems. Surveying the learning science literature, we posit that the problem-solving process can be split into three distinct steps: text comprehension, solution planning and solution execution. We construct tests for each one in order to understand which parts of this process can be faithfully modeled by current state-of-the-art LLMs. We generate a novel set of word problems for each of these tests, using a neuro-symbolic method that enables fine-grained control over the problem features. We find evidence that LLMs, with and without instruction-tuning, exhibit human-like biases in both the text-comprehension and the solution-planning steps of the solving process, but not during the final step which relies on the problem's arithmetic expressions (solution execution).
- [1789] arXiv:2401.00037 (cross-list from q-bio.BM) [ pdf , other ]
-
Title: Messenger RNA Design via Expected Partition Function and Continuous OptimizationSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: The tasks of designing RNAs are discrete optimization problems, and several versions of these problems are NP-hard. As an alternative to commonly used local search methods, we formulate these problems as continuous optimization and develop a general framework for this optimization based on a generalization of classical partition function which we call "expected partition function". The basic idea is to start with a distribution over all possible candidate sequences, and extend the objective function from a sequence to a distribution. We then use gradient descent-based optimization methods to improve the extended objective function, and the distribution will gradually shrink towards a one-hot sequence (i.e., a single sequence). As a case study, we consider the important problem of mRNA design with wide applications in vaccines and therapeutics. While the recent work of LinearDesign can efficiently optimize mRNAs for minimum free energy (MFE), optimizing for ensemble free energy is much harder and likely intractable. Our approach can consistently improve over the LinearDesign solution in terms of ensemble free energy, with bigger improvements on longer sequences.
- [1790] arXiv:2401.00225 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Enhancing dysarthria speech feature representation with empirical mode decomposition and Walsh-Hadamard transformSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: Dysarthria speech contains the pathological characteristics of vocal tract and vocal fold, but so far, they have not yet been included in traditional acoustic feature sets. Moreover, the nonlinearity and non-stationarity of speech have been ignored. In this paper, we propose a feature enhancement algorithm for dysarthria speech called WHFEMD. It combines empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT) to enhance features. With the proposed algorithm, the fast Fourier transform of the dysarthria speech is first performed and then followed by EMD to get intrinsic mode functions (IMFs). After that, FWHT is used to output new coefficients and to extract statistical features based on IMFs, power spectral density, and enhanced gammatone frequency cepstral coefficients. To evaluate the proposed approach, we conducted experiments on two public pathological speech databases including UA Speech and TORGO. The results show that our algorithm performed better than traditional features in classification. We achieved improvements of 13.8% (UA Speech) and 3.84% (TORGO), respectively. Furthermore, the incorporation of an imbalanced classification algorithm to address data imbalance has resulted in a 12.18% increase in recognition accuracy. This algorithm effectively addresses the challenges of the imbalanced dataset and non-linearity in dysarthric speech and simultaneously provides a robust representation of the local pathological features of the vocal folds and tracts.
- [1791] arXiv:2401.00428 (cross-list from hep-ex) [ pdf , other ]
-
Title: Training towards significance with the decorrelated event classifier transformer neural networkAuthors: Jaebak KimComments: 17 pages, 7 figures, Submitted to journalSubjects: High Energy Physics - Experiment (hep-ex) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Experimental particle physics uses machine learning for many of tasks, where one application is to classify signal and background events. The classification can be used to bin an analysis region to enhance the expected significance for a mass resonance search. In natural language processing, one of the leading neural network architectures is the transformer. In this work, an event classifier transformer is proposed to bin an analysis region, in which the network is trained with special techniques. The techniques developed here can enhance the significance and reduce the correlation between the network's output and the reconstructed mass. It is found that this trained network can perform better than boosted decision trees and feed-forward networks.
- [1792] arXiv:2401.00499 (cross-list from physics.chem-ph) [ pdf , ps , other ]
-
Title: Generating High-Precision Force Fields for Molecular Dynamics Simulations to Study Chemical Reaction Mechanisms using Molecular Configuration TransformerAuthors: Sihao Yuan , Xu Han , Jun Zhang , Zhaoxin Xie , Cheng Fan , Yunlong Xiao , Yi Qin Gao , Yi Isaac YangSubjects: Chemical Physics (physics.chem-ph) ; Soft Condensed Matter (cond-mat.soft); Artificial Intelligence (cs.AI)
Abstract: Theoretical studies on chemical reaction mechanisms have been crucial in organic chemistry. Traditionally, calculating the manually constructed molecular conformations of transition states for chemical reactions using quantum chemical calculations is the most commonly used method. However, this way is heavily dependent on individual experience and chemical intuition. In our previous study, we proposed a research paradigm that uses enhanced sampling in molecular dynamics simulations to study chemical reactions. This approach can directly simulate the entire process of a chemical reaction. However, the computational speed limits the use of high-precision potential energy functions for simulations. To address this issue, we present a scheme for training high-precision force fields for molecular modeling using a previously developed graph-neural-network-based molecular model, molecular configuration transformer. This potential energy function allows for highly accurate simulations at a low computational cost, leading to more precise calculations of the mechanism of chemical reactions. We applied this approach to study a Claisen rearrangement reaction and a Carbonyl insertion reaction catalyzed by Manganese.
- [1793] arXiv:2401.00587 (cross-list from eess.IV) [ pdf , other ]
-
Title: Brain Tumor Segmentation Based on Deep Learning, Attention Mechanisms, and Energy-Based Uncertainty PredictionComments: 11 pages, 6 figures, code available at this https URL , submitted to Computers in Biology and MedicineSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI)
Abstract: Brain tumors are one of the deadliest forms of cancer with a mortality rate of over 80%. A quick and accurate diagnosis is crucial to increase the chance of survival. However, in medical analysis, the manual annotation and segmentation of a brain tumor can be a complicated task. Multiple MRI modalities are typically analyzed as they provide unique information regarding the tumor regions. Although these MRI modalities are helpful for segmenting gliomas, they tend to increase overfitting and computation. This paper proposes a region of interest detection algorithm that is implemented during data preprocessing to locate salient features and remove extraneous MRI data. This decreases the input size, allowing for more aggressive data augmentations and deeper neural networks. Following the preprocessing of the MRI modalities, a fully convolutional autoencoder with soft attention segments the different brain MRIs. When these deep learning algorithms are implemented in practice, analysts and physicians cannot differentiate between accurate and inaccurate predictions. Subsequently, test time augmentations and an energy-based model were used for voxel-based uncertainty predictions. Experimentation was conducted on the BraTS benchmarks and achieved state-of-the-art segmentation performance. Additionally, qualitative results were used to assess the segmentation models and uncertainty predictions.
- [1794] arXiv:2401.00611 (cross-list from stat.ML) [ pdf , other ]
-
Title: A Compact Representation for Bayesian Neural Networks By Removing Permutation SymmetryComments: Accepted at NeurIPS 2023 Workshop on Unifying Representations in Neural Models; 4 pages + appendixSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Bayesian neural networks (BNNs) are a principled approach to modeling predictive uncertainties in deep learning, which are important in safety-critical applications. Since exact Bayesian inference over the weights in a BNN is intractable, various approximate inference methods exist, among which sampling methods such as Hamiltonian Monte Carlo (HMC) are often considered the gold standard. While HMC provides high-quality samples, it lacks interpretable summary statistics because its sample mean and variance is meaningless in neural networks due to permutation symmetry. In this paper, we first show that the role of permutations can be meaningfully quantified by a number of transpositions metric. We then show that the recently proposed rebasin method allows us to summarize HMC samples into a compact representation that provides a meaningful explicit uncertainty estimate for each weight in a neural network, thus unifying sampling methods with variational inference. We show that this compact representation allows us to compare trained BNNs directly in weight space across sampling methods and variational inference, and to efficiently prune neural networks trained without explicit Bayesian frameworks by exploiting uncertainty estimates from HMC.
- [1795] arXiv:2401.00916 (cross-list from math.DS) [ pdf , other ]
-
Title: Data Assimilation in Chaotic Systems Using Deep Reinforcement LearningAuthors: Mohamad Abed El Rahman Hammoud , Naila Raboudi , Edriss S. Titi , Omar Knio , Ibrahim HoteitSubjects: Dynamical Systems (math.DS) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Atmospheric and Oceanic Physics (physics.ao-ph)
Abstract: Data assimilation (DA) plays a pivotal role in diverse applications, ranging from climate predictions and weather forecasts to trajectory planning for autonomous vehicles. A prime example is the widely used ensemble Kalman filter (EnKF), which relies on linear updates to minimize variance among the ensemble of forecast states. Recent advancements have seen the emergence of deep learning approaches in this domain, primarily within a supervised learning framework. However, the adaptability of such models to untrained scenarios remains a challenge. In this study, we introduce a novel DA strategy that utilizes reinforcement learning (RL) to apply state corrections using full or partial observations of the state variables. Our investigation focuses on demonstrating this approach to the chaotic Lorenz '63 system, where the agent's objective is to minimize the root-mean-squared error between the observations and corresponding forecast states. Consequently, the agent develops a correction strategy, enhancing model forecasts based on available system state observations. Our strategy employs a stochastic action policy, enabling a Monte Carlo-based DA framework that relies on randomly sampling the policy to generate an ensemble of assimilated realizations. Results demonstrate that the developed RL algorithm performs favorably when compared to the EnKF. Additionally, we illustrate the agent's capability to assimilate non-Gaussian data, addressing a significant limitation of the EnKF.
- [1796] arXiv:2401.01001 (cross-list from q-bio.NC) [ pdf , ps , other ]
-
Title: Metalearning-Informed Competence in Children: Implications for Responsible Brain-Inspired Artificial IntelligenceAuthors: Chaitanya SinghComments: 27 pages, 3 figuresSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI)
Abstract: This paper offers a novel conceptual framework comprising four essential cognitive mechanisms that operate concurrently and collaboratively to enable metalearning (knowledge and regulation of learning) strategy implementation in young children. A roadmap incorporating the core mechanisms and the associated strategies is presented as an explanation of the developing brain's remarkable cross-context learning competence. The tetrad of fundamental complementary processes is chosen to collectively represent the bare-bones metalearning architecture that can be extended to artificial intelligence (AI) systems emulating brain-like learning and problem-solving skills. Utilizing the metalearning-enabled young mind as a model for brain-inspired computing, this work further discusses important implications for morally grounded AI.
- [1797] arXiv:2401.01056 (cross-list from eess.SP) [ pdf , other ]
-
Title: Enhancing Automatic Modulation Recognition through Robust Global Feature ExtractionComments: submitted to IEEE Transactions on Vehicular Technology, 14 pages, 11 figuresSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Automatic Modulation Recognition (AMR) plays a crucial role in wireless communication systems. Deep learning AMR strategies have achieved tremendous success in recent years. Modulated signals exhibit long temporal dependencies, and extracting global features is crucial in identifying modulation schemes. Traditionally, human experts analyze patterns in constellation diagrams to classify modulation schemes. Classical convolutional-based networks, due to their limited receptive fields, excel at extracting local features but struggle to capture global relationships. To address this limitation, we introduce a novel hybrid deep framework named TLDNN, which incorporates the architectures of the transformer and long short-term memory (LSTM). We utilize the self-attention mechanism of the transformer to model the global correlations in signal sequences while employing LSTM to enhance the capture of temporal dependencies. To mitigate the impact like RF fingerprint features and channel characteristics on model generalization, we propose data augmentation strategies known as segment substitution (SS) to enhance the model's robustness to modulation-related features. Experimental results on widely-used datasets demonstrate that our method achieves state-of-the-art performance and exhibits significant advantages in terms of complexity. Our proposed framework serves as a foundational backbone that can be extended to different datasets. We have verified the effectiveness of our augmentation approach in enhancing the generalization of the models, particularly in few-shot scenarios. Code is available at \url{ this https URL }.
- [1798] arXiv:2401.01099 (cross-list from eess.AS) [ pdf , other ]
-
Title: Efficient Parallel Audio Generation using Group Masked Language ModelingComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inference due to iterative sampling. To resolve this problem, we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel Decoding~(G-IPD) for efficient parallel audio generation. Both the training and sampling schemes enable the model to synthesize high-quality audio with a small number of iterations by effectively modeling the group-wise conditional dependencies. In addition, our model employs a cross-attention-based architecture to capture the speaker style of the prompt voice and improves computational efficiency. Experimental results demonstrate that our proposed model outperforms the baselines in prompt-based audio generation.
- [1799] arXiv:2401.01104 (cross-list from astro-ph.SR) [ pdf , other ]
-
Title: AI-FLARES: Artificial Intelligence for the Analysis of Solar Flares DataAuthors: Michele Piana , Federico Benvenuto , Anna Maria Massone , Cristina Campi , Sabrina Guastavino , Francesco Marchetti , Paolo Massa , Emma Perracchione , Anna VolparaSubjects: Solar and Stellar Astrophysics (astro-ph.SR) ; Artificial Intelligence (cs.AI)
Abstract: AI-FLARES (Artificial Intelligence for the Analysis of Solar Flares Data) is a research project funded by the Agenzia Spaziale Italiana and by the Istituto Nazionale di Astrofisica within the framework of the ``Attività di Studio per la Comunità Scientifica Nazionale Sole, Sistema Solare ed Esopianeti'' program. The topic addressed by this project was the development and use of computational methods for the analysis of remote sensing space data associated to solar flare emission. This paper overviews the main results obtained by the project, with specific focus on solar flare forecasting, reconstruction of morphologies of the flaring sources, and interpretation of acceleration mechanisms triggered by solar flares.
- [1800] arXiv:2401.01349 (cross-list from physics.soc-ph) [ pdf , ps , other ]
-
Title: The Anatomy Spread of Online Opinion Polarization: The Pivotal Role of Super-Spreaders in Social NetworksAuthors: Yasuko KawahataComments: Detection Method for Social Subsets Consisting of Anti-Network Construction for Unilateral Preference Behavior on Directed Temporal Networks(2023)Subjects: Physics and Society (physics.soc-ph) ; Artificial Intelligence (cs.AI)
Abstract: The study investigates the role of 'superspreaders' in shaping opinions within networks, distinguishing three types: A, B, and C. Type A has a significant influence in shaping opinions, Type B acts as a counterbalance to A, and Type C functions like media, providing an objective viewpoint and potentially regulating A and B's influence. The research uses a confidence coefficient and z-score to survey superspreaders' behaviors, with a focus on the conditions affecting group dynamics and opinion formation, including environmental factors and forgetfulness over time. The findings offer insights for improving online communication security and understanding social influence.
- [1801] arXiv:2401.01364 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: Multi-Modal Cognitive Maps based on Neural Networks trained on Successor RepresentationsSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Abstract: Cognitive maps are a proposed concept on how the brain efficiently organizes memories and retrieves context out of them. The entorhinal-hippocampal complex is heavily involved in episodic and relational memory processing, as well as spatial navigation and is thought to built cognitive maps via place and grid cells. To make use of the promising properties of cognitive maps, we set up a multi-modal neural network using successor representations which is able to model place cell dynamics and cognitive map representations. Here, we use multi-modal inputs consisting of images and word embeddings. The network learns the similarities between novel inputs and the training database and therefore the representation of the cognitive map successfully. Subsequently, the prediction of the network can be used to infer from one modality to another with over $90\%$ accuracy. The proposed method could therefore be a building block to improve current AI systems for better understanding of the environment and the different modalities in which objects appear. The association of specific modalities with certain encounters can therefore lead to context awareness in novel situations when similar encounters with less information occur and additional information can be inferred from the learned cognitive map. Cognitive maps, as represented by the entorhinal-hippocampal complex in the brain, organize and retrieve context from memories, suggesting that large language models (LLMs) like ChatGPT could harness similar architectures to function as a high-level processing center, akin to how the hippocampus operates within the cortex hierarchy. Finally, by utilizing multi-modal inputs, LLMs can potentially bridge the gap between different forms of data (like images and words), paving the way for context-awareness and grounding of abstract concepts through learned associations, addressing the grounding problem in AI.
- [1802] arXiv:2401.01383 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: Predicting Infant Brain Connectivity with Federated Multi-Trajectory GNNs using Scarce DataSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: The understanding of the convoluted evolution of infant brain networks during the first postnatal year is pivotal for identifying the dynamics of early brain connectivity development. Existing deep learning solutions suffer from three major limitations. First, they cannot generalize to multi-trajectory prediction tasks, where each graph trajectory corresponds to a particular imaging modality or connectivity type (e.g., T1-w MRI). Second, existing models require extensive training datasets to achieve satisfactory performance which are often challenging to obtain. Third, they do not efficiently utilize incomplete time series data. To address these limitations, we introduce FedGmTE-Net++, a federated graph-based multi-trajectory evolution network. Using the power of federation, we aggregate local learnings among diverse hospitals with limited datasets. As a result, we enhance the performance of each hospital's local generative model, while preserving data privacy. The three key innovations of FedGmTE-Net++ are: (i) presenting the first federated learning framework specifically designed for brain multi-trajectory evolution prediction in a data-scarce environment, (ii) incorporating an auxiliary regularizer in the local objective function to exploit all the longitudinal brain connectivity within the evolution trajectory and maximize data utilization, (iii) introducing a two-step imputation process, comprising a preliminary KNN-based precompletion followed by an imputation refinement step that employs regressors to improve similarity scores and refine imputations. Our comprehensive experimental results showed the outperformance of FedGmTE-Net++ in brain multi-trajectory prediction from a single baseline graph in comparison with benchmark methods.
- [1803] arXiv:2401.01489 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: The Neuron as a Direct Data-Driven ControllerAuthors: Jason Moore , Alexander Genkin , Magnus Tournoy , Joshua Pughe-Sanford , Rob R. de Ruyter van Steveninck , Dmitri B. ChklovskiiSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: In the quest to model neuronal function amidst gaps in physiological data, a promising strategy is to develop a normative theory that interprets neuronal physiology as optimizing a computational objective. This study extends the current normative models, which primarily optimize prediction, by conceptualizing neurons as optimal feedback controllers. We posit that neurons, especially those beyond early sensory areas, act as controllers, steering their environment towards a specific desired state through their output. This environment comprises both synaptically interlinked neurons and external motor sensory feedback loops, enabling neurons to evaluate the effectiveness of their control via synaptic feedback. Utilizing the novel Direct Data-Driven Control (DD-DC) framework, we model neurons as biologically feasible controllers which implicitly identify loop dynamics, infer latent states and optimize control. Our DD-DC neuron model explains various neurophysiological phenomena: the shift from potentiation to depression in Spike-Timing-Dependent Plasticity (STDP) with its asymmetry, the duration and adaptive nature of feedforward and feedback neuronal filters, the imprecision in spike generation under constant stimulation, and the characteristic operational variability and noise in the brain. Our model presents a significant departure from the traditional, feedforward, instant-response McCulloch-Pitts-Rosenblatt neuron, offering a novel and biologically-informed fundamental unit for constructing neural networks.
- [1804] arXiv:2401.01496 (cross-list from eess.IV) [ pdf , other ]
-
Title: From Pixel to Slide image: Polarization Modality-based Pathological Diagnosis Using Representation LearningSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Thyroid cancer is the most common endocrine malignancy, and accurately distinguishing between benign and malignant thyroid tumors is crucial for developing effective treatment plans in clinical practice. Pathologically, thyroid tumors pose diagnostic challenges due to improper specimen sampling. In this study, we have designed a three-stage model using representation learning to integrate pixel-level and slice-level annotations for distinguishing thyroid tumors. This structure includes a pathology structure recognition method to predict structures related to thyroid tumors, an encoder-decoder network to extract pixel-level annotation information by learning the feature representations of image blocks, and an attention-based learning mechanism for the final classification task. This mechanism learns the importance of different image blocks in a pathological region, globally considering the information from each block. In the third stage, all information from the image blocks in a region is aggregated using attention mechanisms, followed by classification to determine the category of the region. Experimental results demonstrate that our proposed method can predict microscopic structures more accurately. After color-coding, the method achieves results on unstained pathology slides that approximate the quality of Hematoxylin and eosin staining, reducing the need for stained pathology slides. Furthermore, by leveraging the concept of indirect measurement and extracting polarized features from structures correlated with lesions, the proposed method can also classify samples where membrane structures cannot be obtained through sampling, providing a potential objective and highly accurate indirect diagnostic technique for thyroid tumors.
- [1805] arXiv:2401.01789 (cross-list from stat.ML) [ pdf , other ]
-
Title: Deep learning the Hurst parameter of linear fractional processes and assessing its reliabilitySubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This research explores the reliability of deep learning, specifically Long Short-Term Memory (LSTM) networks, for estimating the Hurst parameter in fractional stochastic processes. The study focuses on three types of processes: fractional Brownian motion (fBm), fractional Ornstein-Uhlenbeck (fOU) process, and linear fractional stable motions (lfsm). The work involves a fast generation of extensive datasets for fBm and fOU to train the LSTM network on a large volume of data in a feasible time. The study analyses the accuracy of the LSTM network's Hurst parameter estimation regarding various performance measures like RMSE, MAE, MRE, and quantiles of the absolute and relative errors. It finds that LSTM outperforms the traditional statistical methods in the case of fBm and fOU processes; however, it has limited accuracy on lfsm processes. The research also delves into the implications of training length and valuation sequence length on the LSTM's performance. The methodology is applied by estimating the Hurst parameter in Li-ion battery degradation data and obtaining confidence bounds for the estimation. The study concludes that while deep learning methods show promise in parameter estimation of fractional processes, their effectiveness is contingent on the process type and the quality of training data.
- [1806] arXiv:2401.01792 (cross-list from eess.AS) [ pdf , other ]
-
Title: CoMoSVC: Consistency Model-based Singing Voice ConversionSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
Abstract: The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre. However, the iterative sampling process results in slow inference speed, and acceleration thus becomes crucial. In this paper, we propose CoMoSVC, a consistency model-based SVC method, which aims to achieve both high-quality generation and high-speed sampling. A diffusion-based teacher model is first specially designed for SVC, and a student model is further distilled under self-consistency properties to achieve one-step sampling. Experiments on a single NVIDIA GTX4090 GPU reveal that although CoMoSVC has a significantly faster inference speed than the state-of-the-art (SOTA) diffusion-based SVC system, it still achieves comparable or superior conversion performance based on both subjective and objective metrics. Audio samples and codes are available at this https URL .
- [1807] arXiv:2401.02124 (cross-list from q-bio.BM) [ pdf , ps , other ]
-
Title: ACP-ESM: A novel framework for classification of anticancer peptides using protein-oriented transformer approachSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
Abstract: Anticancer peptides (ACPs) are a class of molecules that have gained significant attention in the field of cancer research and therapy. ACPs are short chains of amino acids, the building blocks of proteins, and they possess the ability to selectively target and kill cancer cells. One of the key advantages of ACPs is their ability to selectively target cancer cells while sparing healthy cells to a greater extent. This selectivity is often attributed to differences in the surface properties of cancer cells compared to normal cells. That is why ACPs are being investigated as potential candidates for cancer therapy. ACPs may be used alone or in combination with other treatment modalities like chemotherapy and radiation therapy. While ACPs hold promise as a novel approach to cancer treatment, there are challenges to overcome, including optimizing their stability, improving selectivity, and enhancing their delivery to cancer cells, continuous increasing in number of peptide sequences, developing a reliable and precise prediction model. In this work, we propose an efficient transformer-based framework to identify anticancer peptides for by performing accurate a reliable and precise prediction model. For this purpose, four different transformer models, namely ESM, ProtBert, BioBERT, and SciBERT are employed to detect anticancer peptides from amino acid sequences. To demonstrate the contribution of the proposed framework, extensive experiments are carried on widely-used datasets in the literature, two versions of AntiCp2, cACP-DeepGram, ACP-740. Experiment results show the usage of proposed model enhances classification accuracy when compared to the state-of-the-art studies. The proposed framework, ESM, exhibits 96.45 of accuracy for AntiCp2 dataset, 97.66 of accuracy for cACP-DeepGram dataset, and 88.51 of accuracy for ACP-740 dataset, thence determining new state-of-the-art.
- [1808] arXiv:2401.02509 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: Memory, Consciousness and Large Language ModelSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: With the development in cognitive science and Large Language Models (LLMs), increasing connections have come to light between these two distinct fields. Building upon these connections, we propose a conjecture suggesting the existence of a duality between LLMs and Tulving's theory of memory. We identify a potential correspondence between Tulving's synergistic ecphory model (SEM) of retrieval and the emergent abilities observed in LLMs, serving as supporting evidence for our conjecture. Furthermore, we speculate that consciousness may be considered a form of emergent ability based on this duality. We also discuss how other theories of consciousness intersect with our research.
- [1809] arXiv:2401.02540 (cross-list from cond-mat.mtrl-sci) [ pdf , other ]
-
Title: DISO: A Domain Ontology for Modeling Dislocations in Crystalline MaterialsSubjects: Materials Science (cond-mat.mtrl-sci) ; Artificial Intelligence (cs.AI)
Abstract: Crystalline materials, such as metals and semiconductors, nearly always contain a special defect type called dislocation. This defect decisively determines many important material properties, e.g., strength, fracture toughness, or ductility. Over the past years, significant effort has been put into understanding dislocation behavior across different length scales via experimental characterization techniques and simulations. This paper introduces the dislocation ontology (DISO), which defines the concepts and relationships related to linear defects in crystalline materials. We developed DISO using a top-down approach in which we start defining the most general concepts in the dislocation domain and subsequent specialization of them. DISO is published through a persistent URL following W3C best practices for publishing Linked Data. Two potential use cases for DISO are presented to illustrate its usefulness in the dislocation dynamics domain. The evaluation of the ontology is performed in two directions, evaluating the success of the ontology in modeling a real-world domain and the richness of the ontology.
- [1810] arXiv:2401.02589 (cross-list from astro-ph.HE) [ pdf , other ]
-
Title: Identification of 4FGL uncertain sources at Higher Resolutions with Inverse Discrete Wavelet TransformSubjects: High Energy Astrophysical Phenomena (astro-ph.HE) ; Artificial Intelligence (cs.AI)
Abstract: In the forthcoming era of big astronomical data, it is a burden to find out target sources from ground-based and space-based telescopes. Although Machine Learning (ML) methods have been extensively utilized to address this issue, the incorporation of in-depth data analysis can significantly enhance the efficiency of identifying target sources when dealing with massive volumes of astronomical data. In this work, we focused on the task of finding AGN candidates and identifying BL Lac/FSRQ candidates from the 4FGL DR3 uncertain sources. We studied the correlations among the attributes of the 4FGL DR3 catalogue and proposed a novel method, named FDIDWT, to transform the original data. The transformed dataset is characterized as low-dimensional and feature-highlighted, with the estimation of correlation features by Fractal Dimension (FD) theory and the multi-resolution analysis by Inverse Discrete Wavelet Transform (IDWT). Combining the FDIDWT method with an improved lightweight MatchboxConv1D model, we accomplished two missions: (1) to distinguish the Active Galactic Nuclei (AGNs) from others (Non-AGNs) in the 4FGL DR3 uncertain sources with an accuracy of 96.65%, namely, Mission A; (2) to classify blazar candidates of uncertain type (BCUs) into BL Lacertae objects (BL Lacs) or Flat Spectrum Radio Quasars (FSRQs) with an accuracy of 92.03%, namely, Mission B. There are 1354 AGN candidates in Mission A, 482 BL Lacs candidates and 128 FSRQ candidates in Mission B were found. The results show a high consistency of greater than 98% with the results in previous works. In addition, our method has the advantage of finding less variable and relatively faint sources than ordinary methods.
- [1811] arXiv:2401.02673 (cross-list from eess.AS) [ pdf , other ]
-
Title: A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end modelAuthors: Dongdi Zhao , Jianbo Ma , Lu Lu , Jinke Li , Xuan Ji , Lei Zhu , Fuming Fang , Ming Liu , Feijun JiangSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: Far-field speech recognition is a challenging task that conventionally uses signal processing beamforming to attack noise and interference problem. But the performance has been found usually limited due to heavy reliance on environmental assumption. In this paper, we propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system, which extends the end-to-end speech recognition system further to include speech enhancement. Such framework is then jointly trained to optimize the final objective of interest. Specifically, factored complex linear projection (fCLP) has been adopted to form the neural beamforming. Several pooling strategies to combine look directions are then compared in order to find the optimal approach. Moreover, information of the source direction is also integrated in the beamforming to explore the usefulness of source direction as a prior, which is usually available especially in multi-modality scenario. Experiments on different microphone array geometry are conducted to evaluate the robustness against spacing variance of microphone array. Large in-house databases are used to evaluate the effectiveness of the proposed framework and the proposed method achieve 19.26\% improvement when compared with a strong baseline.
- [1812] arXiv:2401.02839 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Pheme: Efficient and Conversational Speech GenerationSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Abstract: In recent years, speech generation has seen remarkable progress, now achieving one-shot generation capability that is often virtually indistinguishable from real human voice. Integrating such advancements in speech generation with large language models might revolutionize a wide range of applications. However, certain applications, such as assistive conversational systems, require natural and conversational speech generation tools that also operate efficiently in real time. Current state-of-the-art models like VALL-E and SoundStorm, powered by hierarchical neural audio codecs, require large neural components and extensive training data to work well. In contrast, MQTTS aims to build more compact conversational TTS models while capitalizing on smaller-scale real-life conversational speech data. However, its autoregressive nature yields high inference latency and thus limits its real-time usage. In order to mitigate the current limitations of the state-of-the-art TTS models while capitalizing on their strengths, in this work we introduce the Pheme model series that 1) offers compact yet high-performing models, 2) allows for parallel speech generation of 3) natural conversational speech, and 4) it can be trained efficiently on smaller-scale conversational data, cutting data demands by more than 10x but still matching the quality of the autoregressive TTS models. We also show that through simple teacher-student distillation we can meet significant improvements in voice quality for single-speaker setups on top of pretrained Pheme checkpoints, relying solely on synthetic speech generated by much larger teacher models. Audio samples and pretrained models are available online.
- [1813] arXiv:2401.02873 (cross-list from math.OC) [ pdf , other ]
-
Title: Optimal Chaining of Vehicle Plans with Time WindowsComments: 26 pages, 7 figures, submitted to "Transportation Research Part C: Emerging Technologies" journalSubjects: Optimization and Control (math.OC) ; Artificial Intelligence (cs.AI)
Abstract: For solving problems from the domain of Mobility-on-Demand (MoD), we often need to connect vehicle plans into plans spanning longer time, a process we call plan chaining. As we show in this work, chaining of the plans can be used to reduce the size of MoD providers' fleet (fleet-sizing problem) but also to reduce the total driven distance by providing high-quality vehicle dispatching solutions in MoD systems. Recently, a solution that uses this principle has been proposed to solve the fleet-sizing problem. The method does not consider the time flexibility of the plans. Instead, plans are fixed in time and cannot be delayed. However, time flexibility is an essential property of all vehicle problems with time windows. This work presents a new plan chaining formulation that considers delays as allowed by the time windows and a solution method for solving it. Moreover, we prove that the proposed plan chaining method is optimal, and we analyze its complexity. Finally, we list some practical applications and perform a demonstration for one of them: a new heuristic vehicle dispatching method for solving the static dial-a-ride problem. The demonstration results show that our proposed method provides a better solution than the two heuristic baselines for the majority of instances that cannot be solved optimally. At the same time, our method does not have the largest computational time requirements compared to the baselines. Therefore, we conclude that the proposed optimal chaining method provides not only theoretically sound results but is also practically applicable.
- [1814] arXiv:2401.02884 (cross-list from eess.IV) [ pdf , ps , other ]
-
Title: MsDC-DEQ-Net: Deep Equilibrium Model (DEQ) with Multi-scale Dilated Convolution for Image Compressive Sensing (CS)Comments: 15 pages, 8 figures, open access journal paperSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI)
Abstract: Compressive sensing (CS) is a technique that enables the recovery of sparse signals using fewer measurements than traditional sampling methods. To address the computational challenges of CS reconstruction, our objective is to develop an interpretable and concise neural network model for reconstructing natural images using CS. We achieve this by mapping one step of the iterative shrinkage thresholding algorithm (ISTA) to a deep network block, representing one iteration of ISTA. To enhance learning ability and incorporate structural diversity, we integrate aggregated residual transformations (ResNeXt) and squeeze-and-excitation (SE) mechanisms into the ISTA block. This block serves as a deep equilibrium layer, connected to a semi-tensor product network (STP-Net) for convenient sampling and providing an initial reconstruction. The resulting model, called MsDC-DEQ-Net, exhibits competitive performance compared to state-of-the-art network-based methods. It significantly reduces storage requirements compared to deep unrolling methods, using only one iteration block instead of multiple iterations. Unlike deep unrolling models, MsDC-DEQ-Net can be iteratively used, gradually improving reconstruction accuracy while considering computation trade-offs. Additionally, the model benefits from multi-scale dilated convolutions, further enhancing performance.
- [1815] arXiv:2401.02962 (cross-list from eess.IV) [ pdf , other ]
-
Title: Automated Localization of Blood Vessels in Retinal ImagesAuthors: Vahid Mohammadi SafarzadehSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Vessel structure is one of the most important parts of the retina which physicians can detect many diseases by analysing its features. Localization of blood vessels in retina images is an important process in medical image analysis. This process is also more challenging with the presence of bright and dark lesions. In this thesis, two automated vessel localization methods to handle both healthy and unhealthy (pathological) retina images are analyzed. Each method consists of two major steps and the second step is the same in the two methods. In the first step, an algorithm is used to decrease the effect of bright lesions. In Method 1, this algorithm is based on K- Means segmentation, and in Method 2, it is based on a regularization procedure. In the second step of both methods, a multi-scale line operator is used to localize the line-shaped vascular structures and ignore the dark lesions which are generally assumed to have irregular patterns. After the introduction of the methods, a detailed quantitative and qualitative comparison of the methods with one another as well as the state-of-the-art solutions in the literature based on the segmentation results on the images of the two publicly available datasets, DRIVE and STARE, is reported. The results demonstrate that the methods are highly comparable with other solutions.
- [1816] arXiv:2401.03244 (cross-list from math.OC) [ pdf , other ]
-
Title: Artificial Intelligence for Operations Research: Revolutionizing the Operations Research ProcessSubjects: Optimization and Control (math.OC) ; Artificial Intelligence (cs.AI)
Abstract: The rapid advancement of artificial intelligence (AI) techniques has opened up new opportunities to revolutionize various fields, including operations research (OR). This survey paper explores the integration of AI within the OR process (AI4OR) to enhance its effectiveness and efficiency across multiple stages, such as parameter generation, model formulation, and model optimization. By providing a comprehensive overview of the state-of-the-art and examining the potential of AI to transform OR, this paper aims to inspire further research and innovation in the development of AI-enhanced OR methods and tools. The synergy between AI and OR is poised to drive significant advancements and novel solutions in a multitude of domains, ultimately leading to more effective and efficient decision-making.
- [1817] arXiv:2401.03302 (cross-list from eess.IV) [ pdf , other ]
-
Title: Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiTComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract: In the field of medical sciences, reliable detection and classification of brain tumors from images remains a formidable challenge due to the rarity of tumors within the population of patients. Therefore, the ability to detect tumors in anomaly scenarios is paramount for ensuring timely interventions and improved patient outcomes. This study addresses the issue by leveraging deep learning (DL) techniques to detect and classify brain tumors in challenging situations. The curated data set from the National Brain Mapping Lab (NBML) comprises 81 patients, including 30 Tumor cases and 51 Normal cases. The detection and classification pipelines are separated into two consecutive tasks. The detection phase involved comprehensive data analysis and pre-processing to modify the number of image samples and the number of patients of each class to anomaly distribution (9 Normal per 1 Tumor) to comply with real world scenarios. Next, in addition to common evaluation metrics for the testing, we employed a novel performance evaluation method called Patient to Patient (PTP), focusing on the realistic evaluation of the model. In the detection phase, we fine-tuned a YOLOv8n detection model to detect the tumor region. Subsequent testing and evaluation yielded competitive performance both in Common Evaluation Metrics and PTP metrics. Furthermore, using the Data Efficient Image Transformer (DeiT) module, we distilled a Vision Transformer (ViT) model from a fine-tuned ResNet152 as a teacher in the classification phase. This approach demonstrates promising strides in reliable tumor detection and classification, offering potential advancements in tumor diagnosis for real-world medical imaging scenarios.
- [1818] arXiv:2401.03497 (cross-list from eess.AS) [ pdf , other ]
-
Title: EAT: Self-Supervised Pre-Training with Efficient Audio TransformerSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Abstract: Audio self-supervised learning (SSL) pre-training, which aims to learn good representations from unlabeled audio, has made remarkable progress. However, the extensive computational demands during pre-training pose a significant barrier to the potential application and optimization of audio SSL models. In this paper, inspired by the success of data2vec 2.0 in image modality and Audio-MAE in audio modality, we introduce Efficient Audio Transformer (EAT) to further improve the effectiveness and efficiency in audio SSL. The proposed EAT adopts the bootstrap self-supervised training paradigm to the audio domain. A novel Utterance-Frame Objective (UFO) is designed to enhance the modeling capability of acoustic events. Furthermore, we reveal that the masking strategy is critical in audio SSL pre-training, and superior audio representations can be obtained with large inverse block masks. Experiment results demonstrate that EAT achieves state-of-the-art (SOTA) performance on a range of audio-related tasks, including AudioSet (AS-2M, AS-20K), ESC-50, and SPC-2, along with a significant pre-training speedup up to ~15x compared to existing audio SSL models.
- [1819] arXiv:2401.03621 (cross-list from eess.IV) [ pdf , ps , other ]
-
Title: Machine Learning Applications in Traumatic Brain Injury: A Spotlight on Mild TBIComments: The manuscript has 34 pages, 3 figures, and 4 tablesSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Traumatic Brain Injury (TBI) poses a significant global public health challenge, contributing to high morbidity and mortality rates and placing a substantial economic burden on healthcare systems worldwide. The diagnosis of TBI relies on clinical information along with Computed Tomography (CT) scans. Addressing the multifaceted challenges posed by TBI has seen the development of innovative, data-driven approaches, for this complex condition. Particularly noteworthy is the prevalence of mild TBI (mTBI), which constitutes the majority of TBI cases where conventional methods often fall short. As such, we review the state-of-the-art Machine Learning (ML) techniques applied to clinical information and CT scans in TBI, with a particular focus on mTBI. We categorize ML applications based on their data sources, and there is a spectrum of ML techniques used to date. Most of these techniques have primarily focused on diagnosis, with relatively few attempts at predicting the prognosis. This review may serve as a source of inspiration for future research studies aimed at improving the diagnosis of TBI using data-driven approaches and standard diagnostic data.
- [1820] arXiv:2401.03639 (cross-list from q-bio.NC) [ pdf , ps , other ]
-
Title: Deep Learning for Visual NeuroprosthesisSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: The visual pathway involves complex networks of cells and regions which contribute to the encoding and processing of visual information. While some aspects of visual perception are understood, there are still many unanswered questions regarding the exact mechanisms of visual encoding and the organization of visual information along the pathway. This chapter discusses the importance of visual perception and the challenges associated with understanding how visual information is encoded and represented in the brain. Furthermore, this chapter introduces the concept of neuroprostheses: devices designed to enhance or replace bodily functions, and highlights the importance of constructing computational models of the visual pathway in the implementation of such devices. A number of such models, employing the use of deep learning models, are outlined, and their value to understanding visual coding and natural vision is discussed.
- [1821] arXiv:2401.03737 (cross-list from q-fin.CP) [ pdf , other ]
-
Title: Can Large Language Models Beat Wall Street? Unveiling the Potential of AI in Stock SelectionComments: 17 pages, 12 figures, 12 tablesSubjects: Computational Finance (q-fin.CP) ; Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: This paper introduces MarketSenseAI, an innovative framework leveraging GPT-4's advanced reasoning for selecting stocks in financial markets. By integrating Chain of Thought and In-Context Learning, MarketSenseAI analyzes diverse data sources, including market trends, news, fundamentals, and macroeconomic factors, to emulate expert investment decision-making. The development, implementation, and validation of the framework are elaborately discussed, underscoring its capability to generate actionable and interpretable investment signals. A notable feature of this work is employing GPT-4 both as a predictive mechanism and signal evaluator, revealing the significant impact of the AI-generated explanations on signal accuracy, reliability and acceptance. Through empirical testing on the competitive S&P 100 stocks over a 15-month period, MarketSenseAI demonstrated exceptional performance, delivering excess alpha of 10% to 30% and achieving a cumulative return of up to 72% over the period, while maintaining a risk profile comparable to the broader market. Our findings highlight the transformative potential of Large Language Models in financial decision-making, marking a significant leap in integrating generative AI into financial analytics and investment strategies.
- [1822] arXiv:2401.04125 (cross-list from physics.ao-ph) [ pdf , other ]
-
Title: DeepPhysiNet: Bridging Deep Learning and Atmospheric Physics for Accurate and Continuous Weather ModelingComments: 18 pages, 9 figuresSubjects: Atmospheric and Oceanic Physics (physics.ao-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Accurate weather forecasting holds significant importance to human activities. Currently, there are two paradigms for weather forecasting: Numerical Weather Prediction (NWP) and Deep Learning-based Prediction (DLP). NWP utilizes atmospheric physics for weather modeling but suffers from poor data utilization and high computational costs, while DLP can learn weather patterns from vast amounts of data directly but struggles to incorporate physical laws. Both paradigms possess their respective strengths and weaknesses, and are incompatible, because physical laws adopted in NWP describe the relationship between coordinates and meteorological variables, while DLP directly learns the relationships between meteorological variables without consideration of coordinates. To address these problems, we introduce the DeepPhysiNet framework, incorporating physical laws into deep learning models for accurate and continuous weather system modeling. First, we construct physics networks based on multilayer perceptrons (MLPs) for individual meteorological variable, such as temperature, pressure, and wind speed. Physics networks establish relationships between variables and coordinates by taking coordinates as input and producing variable values as output. The physical laws in the form of Partial Differential Equations (PDEs) can be incorporated as a part of loss function. Next, we construct hyper-networks based on deep learning methods to directly learn weather patterns from a large amount of meteorological data. The output of hyper-networks constitutes a part of the weights for the physics networks. Experimental results demonstrate that, upon successful integration of physical laws, DeepPhysiNet can accomplish multiple tasks simultaneously, not only enhancing forecast accuracy but also obtaining continuous spatiotemporal resolution results, which is unattainable by either the NWP or DLP.
- [1823] arXiv:2401.04478 (cross-list from q-bio.BM) [ pdf , other ]
-
Title: TwinBooster: Synergising Large Language Models with Barlow Twins and Gradient Boosting for Enhanced Molecular Property PredictionComments: 13(+9) pages(+appendix), 5 figures, 11 tablesSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: The success of drug discovery and development relies on the precise prediction of molecular activities and properties. While in silico molecular property prediction has shown remarkable potential, its use has been limited so far to assays for which large amounts of data are available. In this study, we use a fine-tuned large language model to integrate biological assays based on their textual information, coupled with Barlow Twins, a Siamese neural network using a novel self-supervised learning approach. This architecture uses both assay information and molecular fingerprints to extract the true molecular information. TwinBooster enables the prediction of properties of unseen bioassays and molecules by providing state-of-the-art zero-shot learning tasks. Remarkably, our artificial intelligence pipeline shows excellent performance on the FS-Mol benchmark. This breakthrough demonstrates the application of deep learning to critical property prediction tasks where data is typically scarce. By accelerating the early identification of active molecules in drug discovery and development, this method has the potential to help streamline the identification of novel therapeutics.
- [1824] arXiv:2401.04579 (cross-list from q-bio.QM) [ pdf , ps , other ]
-
Title: A Deep Network for Explainable Prediction of Non-Imaging Phenotypes using Anatomical Multi-View DataAuthors: Yuxiang Wei , Yuqian Chen , Tengfei Xue , Leo Zekelman , Nikos Makris , Yogesh Rathi , Weidong Cai , Fan Zhang , Lauren J. O' DonnellComments: 2023 The Medical Image Computing and Computer Assisted Intervention Society workshopSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Abstract: Large datasets often contain multiple distinct feature sets, or views, that offer complementary information that can be exploited by multi-view learning methods to improve results. We investigate anatomical multi-view data, where each brain anatomical structure is described with multiple feature sets. In particular, we focus on sets of white matter microstructure and connectivity features from diffusion MRI, as well as sets of gray matter area and thickness features from structural MRI. We investigate machine learning methodology that applies multi-view approaches to improve the prediction of non-imaging phenotypes, including demographics (age), motor (strength), and cognition (picture vocabulary). We present an explainable multi-view network (EMV-Net) that can use different anatomical views to improve prediction performance. In this network, each individual anatomical view is processed by a view-specific feature extractor and the extracted information from each view is fused using a learnable weight. This is followed by a wavelet transform-based module to obtain complementary information across views which is then applied to calibrate the view-specific information. Additionally, the calibrator produces an attention-based calibration score to indicate anatomical structures' importance for interpretation.
- [1825] arXiv:2401.04849 (cross-list from econ.EM) [ pdf , ps , other ]
-
Title: A Deep Learning Representation of Spatial Interaction Model for Resilient Spatial Planning of Community Business ClustersSubjects: Econometrics (econ.EM) ; Artificial Intelligence (cs.AI)
Abstract: Existing Spatial Interaction Models (SIMs) are limited in capturing the complex and context-aware interactions between business clusters and trade areas. To address the limitation, we propose a SIM-GAT model to predict spatiotemporal visitation flows between community business clusters and their trade areas. The model innovatively represents the integrated system of business clusters, trade areas, and transportation infrastructure within an urban region using a connected graph. Then, a graph-based deep learning model, i.e., Graph AttenTion network (GAT), is used to capture the complexity and interdependencies of business clusters. We developed this model with data collected from the Miami metropolitan area in Florida. We then demonstrated its effectiveness in capturing varying attractiveness of business clusters to different residential neighborhoods and across scenarios with an eXplainable AI approach. We contribute a novel method supplementing conventional SIMs to predict and analyze the dynamics of inter-connected community business clusters. The analysis results can inform data-evidenced and place-specific planning strategies helping community business clusters better accommodate their customers across scenarios, and hence improve the resilience of community businesses.
- [1826] arXiv:2401.04950 (cross-list from physics.data-an) [ pdf , other ]
-
Title: Information Flow Rate for Cross-Correlated Stochastic ProcessesAuthors: Dionissios T. HristopulosComments: 16 pages, 5 figures; to appear in IEEE Transactions on Signal ProcessingJournal-ref: IEEE Transactions on Signal Processing, vol. 72, pp. 839-854, 2024Subjects: Data Analysis, Statistics and Probability (physics.data-an) ; Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Abstract: Causal inference seeks to identify cause-and-effect interactions in coupled systems. A recently proposed method by Liang detects causal relations by quantifying the direction and magnitude of information flow between time series. The theoretical formulation of information flow for stochastic dynamical systems provides a general expression and a data-driven statistic for the rate of entropy transfer between different system units. To advance understanding of information flow rate in terms of intuitive concepts and physically meaningful parameters, we investigate statistical properties of the data-driven information flow rate between coupled stochastic processes. We derive relations between the expectation of the information flow rate statistic and properties of the auto- and cross-correlation functions. Thus, we elucidate the dependence of the information flow rate on the analytical properties and characteristic times of the correlation functions. Our analysis provides insight into the influence of the sampling step, the strength of cross-correlations, and the temporal delay of correlations on information flow rate. We support the theoretical results with numerical simulations of correlated Gaussian processes.
- [1827] arXiv:2401.05342 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: Most discriminative stimuli for functional cell type clusteringAuthors: Max F. Burg , Thomas Zenkel , Michaela Vystrčilová , Jonathan Oesterle , Larissa Höfling , Konstantin F. Willeke , Jan Lause , Sarah Müller , Paul G. Fahey , Zhiwei Ding , Kelli Restivo , Shashwat Sridhar , Tim Gollisch , Philipp Berens , Andreas S. Tolias , Thomas Euler , Matthias Bethge , Alexander S. EckerSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Identifying cell types and understanding their functional properties is crucial for unraveling the mechanisms underlying perception and cognition. In the retina, functional types can be identified by carefully selected stimuli, but this requires expert domain knowledge and biases the procedure towards previously known cell types. In the visual cortex, it is still unknown what functional types exist and how to identify them. Thus, for unbiased identification of the functional cell types in retina and visual cortex, new approaches are needed. Here we propose an optimization-based clustering approach using deep predictive models to obtain functional clusters of neurons using Most Discriminative Stimuli (MDS). Our approach alternates between stimulus optimization with cluster reassignment akin to an expectation-maximization algorithm. The algorithm recovers functional clusters in mouse retina, marmoset retina and macaque visual area V4. This demonstrates that our approach can successfully find discriminative stimuli across species, stages of the visual system and recording techniques. The resulting most discriminative stimuli can be used to assign functional cell types fast and on the fly, without the need to train complex predictive models or show a large natural scene dataset, paving the way for experiments that were previously limited by experimental time. Crucially, MDS are interpretable: they visualize the distinctive stimulus patterns that most unambiguously identify a specific type of neuron.
- [1828] arXiv:2401.05370 (cross-list from q-bio.BM) [ pdf , other ]
-
Title: Autoregressive fragment-based diffusion for pocket-aware ligand designComments: Accepted, NeurIPS 2023 Generative AI and Biology Workshop. OpenReview: this https URLSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Quantitative Methods (q-bio.QM)
Abstract: In this work, we introduce AutoFragDiff, a fragment-based autoregressive diffusion model for generating 3D molecular structures conditioned on target protein structures. We employ geometric vector perceptrons to predict atom types and spatial coordinates of new molecular fragments conditioned on molecular scaffolds and protein pockets. Our approach improves the local geometry of the resulting 3D molecules while maintaining high predicted binding affinity to protein targets. The model can also perform scaffold extension from user-provided starting molecular scaffold.
- [1829] arXiv:2401.05384 (cross-list from math.HO) [ pdf , other ]
-
Title: From Good to Great: Improving Math Reasoning with Tool-Augmented Interleaf PromptingComments: 16 pagesSubjects: History and Overview (math.HO) ; Artificial Intelligence (cs.AI)
Abstract: This paper investigates the performance of Large Language Models (LLMs) and Tool-augmented LLMs in tackling complex mathematical reasoning tasks. We introduce IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf Prompting, a framework that combines the strengths of both LLMs and Tool-augmented LLMs. IMP-TIP follows the ``From Good to Great" concept, collecting multiple potential solutions from both LLMs and their Tool-Augmented counterparts for the same math problem, and then selecting or re-generating the most accurate answer after cross-checking these solutions via tool-augmented interleaf prompting. The framework incorporates two key aspects: self-prompt and tool-augmented interleaf prompting (TIP). The former allows LLMs to autonomously refine and improve an initial prompt related to tool usage, while the latter enables LLMs to derive the final answer by dynamically analyzing the problem, cross-checking potential solutions, and revising previous reasoning hints in an interleaved manner. Experimental analysis shows that IMP-TIP achieves enhanced mathematical capabilities and outperforms traditional LLMs and tool-augmented LLMs in accuracy and reasoning diversity on math reasoning tasks. For instance, IMP-TIP can improve Tool-augmented ChatGPT on GSM8K-Hard from 56.0% to 65.2%.
- [1830] arXiv:2401.05395 (cross-list from econ.GN) [ pdf , other ]
-
Title: SRNI-CAR: A comprehensive dataset for analyzing the Chinese automotive marketJournal-ref: Proceedings of 2023 IEEE International Conference on Big Data (BigData), page 3405-3412Subjects: General Economics (econ.GN) ; Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Abstract: The automotive industry plays a critical role in the global economy, and particularly important is the expanding Chinese automobile market due to its immense scale and influence. However, existing automotive sector datasets are limited in their coverage, failing to adequately consider the growing demand for more and diverse variables. This paper aims to bridge this data gap by introducing a comprehensive dataset spanning the years from 2016 to 2022, encompassing sales data, online reviews, and a wealth of information related to the Chinese automotive industry. This dataset serves as a valuable resource, significantly expanding the available data. Its impact extends to various dimensions, including improving forecasting accuracy, expanding the scope of business applications, informing policy development and regulation, and advancing academic research within the automotive sector. To illustrate the dataset's potential applications in both business and academic contexts, we present two application examples. Our developed dataset enhances our understanding of the Chinese automotive market and offers a valuable tool for researchers, policymakers, and industry stakeholders worldwide.
- [1831] arXiv:2401.05402 (cross-list from cond-mat.mtrl-sci) [ pdf , other ]
-
Title: Vector Field Oriented Diffusion Model for Crystal Material GenerationSubjects: Materials Science (cond-mat.mtrl-sci) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Discovering crystal structures with specific chemical properties has become an increasingly important focus in material science. However, current models are limited in their ability to generate new crystal lattices, as they only consider atomic positions or chemical composition. To address this issue, we propose a probabilistic diffusion model that utilizes a geometrically equivariant GNN to consider atomic positions and crystal lattices jointly. To evaluate the effectiveness of our model, we introduce a new generation metric inspired by Frechet Inception Distance, but based on GNN energy prediction rather than InceptionV3 used in computer vision. In addition to commonly used metrics like validity, which assesses the plausibility of a structure, this new metric offers a more comprehensive evaluation of our model's capabilities. Our experiments on existing benchmarks show the significance of our diffusion model. We also show that our method can effectively learn meaningful representations.
- [1832] arXiv:2401.05406 (cross-list from eess.SP) [ pdf , other ]
-
Title: RFRL Gym: A Reinforcement Learning Testbed for Cognitive Radio ApplicationsAuthors: Daniel Rosen (1), Illa Rochez (1), Caleb McIrvin (1), Joshua Lee (1), Kevin D'Alessandro (1), Max Wiecek (1), Nhan Hoang (1), Ramzy Saffarini (1), Sam Philips (1), Vanessa Jones (1), Will Ivey (1), Zavier Harris-Smart (2), Zavion Harris-Smart (2), Zayden Chin (2), Amos Johnson (2), Alyse M. Jones (1), William C. Headley (1) ((1) Virginia Tech, (2) Morehouse College)Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
Abstract: Radio Frequency Reinforcement Learning (RFRL) is anticipated to be a widely applicable technology in the next generation of wireless communication systems, particularly 6G and next-gen military communications. Given this, our research is focused on developing a tool to promote the development of RFRL techniques that leverage spectrum sensing. In particular, the tool was designed to address two cognitive radio applications, specifically dynamic spectrum access and jamming. In order to train and test reinforcement learning (RL) algorithms for these applications, a simulation environment is necessary to simulate the conditions that an agent will encounter within the Radio Frequency (RF) spectrum. In this paper, such an environment has been developed, herein referred to as the RFRL Gym. Through the RFRL Gym, users can design their own scenarios to model what an RL agent may encounter within the RF spectrum as well as experiment with different spectrum sensing techniques. Additionally, the RFRL Gym is a subclass of OpenAI gym, enabling the use of third-party ML/RL Libraries. We plan to open-source this codebase to enable other researchers to utilize the RFRL Gym to test their own scenarios and RL algorithms, ultimately leading to the advancement of RL research in the wireless communications domain. This paper describes in further detail the components of the Gym, results from example scenarios, and plans for future additions.
Index Terms-machine learning, reinforcement learning, wireless communications, dynamic spectrum access, OpenAI gym - [1833] arXiv:2401.05409 (cross-list from eess.SP) [ pdf , other ]
-
Title: Image-based Data Representations of Time Series: A Comparative Analysis in EEG Artifact DetectionComments: 13 pages, 4 figuresSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Alternative data representations are powerful tools that augment the performance of downstream models. However, there is an abundance of such representations within the machine learning toolbox, and the field lacks a comparative understanding of the suitability of each representation method.
In this paper, we propose artifact detection and classification within EEG data as a testbed for profiling image-based data representations of time series data. We then evaluate eleven popular deep learning architectures on each of six commonly-used representation methods.
We find that, while the choice of representation entails a choice within the tradeoff between bias and variance, certain representations are practically more effective in highlighting features which increase the signal-to-noise ratio of the data. We present our results on EEG data, and open-source our testing framework to enable future comparative analyses in this vein. - [1834] arXiv:2401.05416 (cross-list from eess.SP) [ pdf , other ]
-
Title: Wavelet Dynamic Selection Network for Inertial Sensor Signal EnhancementComments: Accepted by AAAI 2024 - Association for the Advancement of Artificial IntelligenceSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: As attitude and motion sensing components, inertial sensors are widely used in various portable devices. But the severe errors of inertial sensors restrain their function, especially the trajectory recovery and semantic recognition. As a mainstream signal processing method, wavelet is hailed as the mathematical microscope of signal due to the plentiful and diverse wavelet basis functions. However, complicated noise types and application scenarios of inertial sensors make selecting wavelet basis perplexing. To this end, we propose a wavelet dynamic selection network (WDSNet), which intelligently selects the appropriate wavelet basis for variable inertial signals. In addition, existing deep learning architectures excel at extracting features from input data but neglect to learn the characteristics of target categories, which is essential to enhance the category awareness capability, thereby improving the selection of wavelet basis. Therefore, we propose a category representation mechanism (CRM), which enables the network to extract and represent category features without increasing trainable parameters. Furthermore, CRM transforms the common fully connected network into category representations, which provide closer supervision to the feature extractor than the far and trivial one-hot classification labels. We call this process of imposing interpretability on a network and using it to supervise the feature extractor the feature supervision mechanism, and its effectiveness is demonstrated experimentally and theoretically in this paper. The enhanced inertial signal can perform impracticable tasks with regard to the original signal, such as trajectory reconstruction. Both quantitative and visual results show that WDSNet outperforms the existing methods. Remarkably, WDSNet, as a weakly-supervised method, achieves the state-of-the-art performance of all the compared fully-supervised methods.
- [1835] arXiv:2401.05418 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: ANALYTiC: Understanding Decision Boundaries and Dimensionality Reduction in Machine LearningAuthors: Salman HaidriComments: Bachelor's thesisSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Abstract: The advent of compact, handheld devices has given us a pool of tracked movement data that could be used to infer trends and patterns that can be made to use. With this flooding of various trajectory data of animals, humans, vehicles, etc., the idea of ANALYTiC originated, using active learning to infer semantic annotations from the trajectories by learning from sets of labeled data. This study explores the application of dimensionality reduction and decision boundaries in combination with the already present active learning, highlighting patterns and clusters in data. We test these features with three different trajectory datasets with objective of exploiting the the already labeled data and enhance their interpretability. Our experimental analysis exemplifies the potential of these combined methodologies in improving the efficiency and accuracy of trajectory labeling. This study serves as a stepping-stone towards the broader integration of machine learning and visual methods in context of movement data analysis.
- [1836] arXiv:2401.05422 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: Machine Learning (ML)-assisted Beam Management in millimeter (mm)Wave Distributed Multiple Input Multiple Output (D-MIMO) systemsSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI)
Abstract: Beam management (BM) protocols are critical for establishing and maintaining connectivity between network radio nodes and User Equipments (UEs). In Distributed Multiple Input Multiple Output systems (D-MIMO), a number of access points (APs), coordinated by a central processing unit (CPU), serves a number of UEs. At mmWave frequencies, the problem of finding the best AP and beam to serve the UEs is challenging due to a large number of beams that need to be sounded with Downlink (DL) reference signals. The objective of this paper is to investigate whether the best AP/beam can be reliably inferred from sounding only a small subset of beams and leveraging AI/ML for inference of best beam/AP. We use Random Forest (RF), MissForest (MF) and conditional Generative Adversarial Networks (c-GAN) for demonstrating the performance benefits of inference.
- [1837] arXiv:2401.05426 (cross-list from eess.SP) [ pdf , other ]
-
Title: CoSS: Co-optimizing Sensor and Sampling Rate for Data-Efficient AI in Human Activity RecognitionComments: Accepeted by the 2nd Workshop on Sustainable AI (AAAI24)Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Recent advancements in Artificial Neural Networks have significantly improved human activity recognition using multiple time-series sensors. While employing numerous sensors with high-frequency sampling rates usually improves the results, it often leads to data inefficiency and unnecessary expansion of the ANN, posing a challenge for their practical deployment on edge devices. Addressing these issues, our work introduces a pragmatic framework for data-efficient utilization in HAR tasks, considering the optimization of both sensor modalities and sampling rate simultaneously. Central to our approach are the designed trainable parameters, termed 'Weight Scores,' which assess the significance of each sensor modality and sampling rate during the training phase. These scores guide the sensor modalities and sampling rate selection. The pruning method allows users to make a trade-off between computational budgets and performance by selecting the sensor modalities and sampling rates according to the weight score ranking. We tested our framework's effectiveness in optimizing sensor modality and sampling rate selection using three public HAR benchmark datasets. The results show that the sensor and sampling rate combination selected via CoSS achieves similar classification performance to configurations using the highest sampling rate with all sensors but at a reduced hardware cost.
- [1838] arXiv:2401.05431 (cross-list from eess.SP) [ pdf , other ]
-
Title: TRLS: A Time Series Representation Learning Framework via Spectrogram for Medical Signal ProcessingAuthors: Luyuan Xie , Cong Li , Xin Zhang , Shengfang Zhai , Yuejian Fang , Qingni Shen , Zhonghai WuComments: This paper is accept by ICASSP 2024. This is a more detailed versionSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Representation learning frameworks in unlabeled time series have been proposed for medical signal processing. Despite the numerous excellent progresses have been made in previous works, we observe the representation extracted for the time series still does not generalize well. In this paper, we present a Time series (medical signal) Representation Learning framework via Spectrogram (TRLS) to get more informative representations. We transform the input time-domain medical signals into spectrograms and design a time-frequency encoder named Time Frequency RNN (TFRNN) to capture more robust multi-scale representations from the augmented spectrograms. Our TRLS takes spectrogram as input with two types of different data augmentations and maximizes the similarity between positive ones, which effectively circumvents the problem of designing negative samples. Our evaluation of four real-world medical signal datasets focusing on medical signal classification shows that TRLS is superior to the existing frameworks.
- [1839] arXiv:2401.05434 (cross-list from eess.SP) [ pdf , ps , other ]
-
Title: ECGformer: Leveraging transformer for ECG heartbeat arrhythmia classificationSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: An arrhythmia, also known as a dysrhythmia, refers to an irregular heartbeat. There are various types of arrhythmias that can originate from different areas of the heart, resulting in either a rapid, slow, or irregular heartbeat. An electrocardiogram (ECG) is a vital diagnostic tool used to detect heart irregularities and abnormalities, allowing experts to analyze the heart's electrical signals to identify intricate patterns and deviations from the norm. Over the past few decades, numerous studies have been conducted to develop automated methods for classifying heartbeats based on ECG data. In recent years, deep learning has demonstrated exceptional capabilities in tackling various medical challenges, particularly with transformers as a model architecture for sequence processing. By leveraging the transformers, we developed the ECGformer model for the classification of various arrhythmias present in electrocardiogram data. We assessed the suggested approach using the MIT-BIH and PTB datasets. ECG heartbeat arrhythmia classification results show that the proposed method is highly effective.
- [1840] arXiv:2401.05437 (cross-list from eess.SP) [ pdf , other ]
-
Title: Representation Learning for Wearable-Based Applications in the Case of Missing DataComments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024 ( this https URL )Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Wearable devices continuously collect sensor data and use it to infer an individual's behavior, such as sleep, physical activity, and emotions. Despite the significant interest and advancements in this field, modeling multimodal sensor data in real-world environments is still challenging due to low data quality and limited data annotations. In this work, we investigate representation learning for imputing missing wearable data and compare it with state-of-the-art statistical approaches. We investigate the performance of the transformer model on 10 physiological and behavioral signals with different masking ratios. Our results show that transformers outperform baselines for missing data imputation of signals that change more frequently, but not for monotonic signals. We further investigate the impact of imputation strategies and masking rations on downstream classification tasks. Our study provides insights for the design and development of masking-based self-supervised learning tasks and advocates the adoption of hybrid-based imputation strategies to address the challenge of missing data in wearable devices.
- [1841] arXiv:2401.05446 (cross-list from eess.SP) [ pdf , other ]
-
Title: Self-supervised Learning for Electroencephalogram: A Systematic SurveyComments: 35 pages, 12 figuresSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Electroencephalogram (EEG) is a non-invasive technique to record bioelectrical signals. Integrating supervised deep learning techniques with EEG signals has recently facilitated automatic analysis across diverse EEG-based tasks. However, the label issues of EEG signals have constrained the development of EEG-based deep models. Obtaining EEG annotations is difficult that requires domain experts to guide collection and labeling, and the variability of EEG signals among different subjects causes significant label shifts. To solve the above challenges, self-supervised learning (SSL) has been proposed to extract representations from unlabeled samples through well-designed pretext tasks. This paper concentrates on integrating SSL frameworks with temporal EEG signals to achieve efficient representation and proposes a systematic review of the SSL for EEG signals. In this paper, 1) we introduce the concept and theory of self-supervised learning and typical SSL frameworks. 2) We provide a comprehensive review of SSL for EEG analysis, including taxonomy, methodology, and technique details of the existing EEG-based SSL frameworks, and discuss the difference between these methods. 3) We investigate the adaptation of the SSL approach to various downstream tasks, including the task description and related benchmark datasets. 4) Finally, we discuss the potential directions for future SSL-EEG research.
- [1842] arXiv:2401.05447 (cross-list from q-fin.ST) [ pdf , other ]
-
Title: Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps?Authors: Baptiste Lefort , Eric Benhamou , Jean-Jacques Ohana , David Saltiel , Beatrice Guez , Damien ChalletSubjects: Statistical Finance (q-fin.ST) ; Artificial Intelligence (cs.AI)
Abstract: We used a dataset of daily Bloomberg Financial Market Summaries from 2010 to 2023, reposted on large financial media, to determine how global news headlines may affect stock market movements using ChatGPT and a two-stage prompt approach. We document a statistically significant positive correlation between the sentiment score and future equity market returns over short to medium term, which reverts to a negative correlation over longer horizons. Validation of this correlation pattern across multiple equity markets indicates its robustness across equity regions and resilience to non-linearity, evidenced by comparison of Pearson and Spearman correlations. Finally, we provide an estimate of the optimal horizon that strikes a balance between reactivity to new information and correlation.
- [1843] arXiv:2401.05535 (cross-list from stat.ML) [ pdf , other ]
-
Title: Improving the Accuracy and Interpretability of Random Forests via Forest PruningAuthors: Albert DoradorSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Optimization and Control (math.OC)
Abstract: Decades after their inception, random forests continue to provide state-of-the-art accuracy in a variety of learning problems, outperforming in this respect alternative machine learning algorithms such as decision trees or even neural networks. However, being an ensemble method, the one aspect where random forests tend to severely underperform decision trees is interpretability. In the present work, we propose a post-hoc approach that aims to have the best of both worlds: the accuracy of random forests and the interpretability of decision trees. To this end, we present two forest-pruning methods to find an optimal sub-forest within a given random forest, and then, when applicable, combine the selected trees into one. Our first method relies on constrained exhaustive search, while our second method is based on an adaptation of the LASSO methodology. Extensive experiments over synthetic and real world datasets show that, in the majority of scenarios, at least one of the two methods proposed is more accurate than the original random forest, while just using a small fraction of the trees, aiding result interpretability. Compared to current state-of-the-art forest pruning methods, namely sequential forward selection and (a variation of) sequential backward selection, our methods tend to outperform both of them, whether in terms of accuracy, number of trees employed, or both.
- [1844] arXiv:2401.05815 (cross-list from physics.acc-ph) [ pdf , other ]
-
Title: Cheetah: Bridging the Gap Between Machine Learning and Particle Accelerator Physics with High-Speed, Differentiable SimulationsComments: 16 pages, 9 figures, 3 tablesSubjects: Accelerator Physics (physics.acc-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Machine learning has emerged as a powerful solution to the modern challenges in accelerator physics. However, the limited availability of beam time, the computational cost of simulations, and the high-dimensionality of optimisation problems pose significant challenges in generating the required data for training state-of-the-art machine learning models. In this work, we introduce Cheetah, a PyTorch-based high-speed differentiable linear-beam dynamics code. Cheetah enables the fast collection of large data sets by reducing computation times by multiple orders of magnitude and facilitates efficient gradient-based optimisation for accelerator tuning and system identification. This positions Cheetah as a user-friendly, readily extensible tool that integrates seamlessly with widely adopted machine learning tools. We showcase the utility of Cheetah through five examples, including reinforcement learning training, gradient-based beamline tuning, gradient-based system identification, physics-informed Bayesian optimisation priors, and modular neural network surrogate modelling of space charge effects. The use of such a high-speed differentiable simulation code will simplify the development of machine learning-based methods for particle accelerators and fast-track their integration into everyday operations of accelerator facilities.
- [1845] arXiv:2401.05848 (cross-list from cond-mat.mtrl-sci) [ pdf , other ]
-
Title: Pushing the Pareto front of band gap and permittivity: ML-guided search for dielectric materialsComments: 27 pages, 11 figures, 5 authorsSubjects: Materials Science (cond-mat.mtrl-sci) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Abstract: Materials with high-dielectric constant easily polarize under external electric fields, allowing them to perform essential functions in many modern electronic devices. Their practical utility is determined by two conflicting properties: high dielectric constants tend to occur in materials with narrow band gaps, limiting the operating voltage before dielectric breakdown. We present a high-throughput workflow that combines element substitution, ML pre-screening, ab initio simulation and human expert intuition to efficiently explore the vast space of unknown materials for potential dielectrics, leading to the synthesis and characterization of two novel dielectric materials, CsTaTeO6 and Bi2Zr2O7. Our key idea is to deploy ML in a multi-objective optimization setting with concave Pareto front. While usually considered more challenging than single-objective optimization, we argue and show preliminary evidence that the $1/x$-correlation between band gap and permittivity in fact makes the task more amenable to ML methods by allowing separate models for band gap and permittivity to each operate in regions of good training support while still predicting materials of exceptional merit. To our knowledge, this is the first instance of successful ML-guided multi-objective materials optimization achieving experimental synthesis and characterization. CsTaTeO6 is a structure generated via element substitution not present in our reference data sources, thus exemplifying successful de-novo materials design. Meanwhile, we report the first high-purity synthesis and dielectric characterization of Bi2Zr2O7 with a band gap of 2.27 eV and a permittivity of 20.5, meeting all target metrics of our multi-objective search.
- [1846] arXiv:2401.06005 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: How does the primate brain combine generative and discriminative computations in vision?Authors: Benjamin Peters , James J. DiCarlo , Todd Gureckis , Ralf Haefner , Leyla Isik , Joshua Tenenbaum , Talia Konkle , Thomas Naselaris , Kimberly Stachenfeld , Zenna Tavares , Doris Tsao , Ilker Yildirim , Nikolaus KriegeskorteSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Vision is widely understood as an inference problem. However, two contrasting conceptions of the inference process have each been influential in research on biological vision as well as the engineering of machine vision. The first emphasizes bottom-up signal flow, describing vision as a largely feedforward, discriminative inference process that filters and transforms the visual information to remove irrelevant variation and represent behaviorally relevant information in a format suitable for downstream functions of cognition and behavioral control. In this conception, vision is driven by the sensory data, and perception is direct because the processing proceeds from the data to the latent variables of interest. The notion of "inference" in this conception is that of the engineering literature on neural networks, where feedforward convolutional neural networks processing images are said to perform inference. The alternative conception is that of vision as an inference process in Helmholtz's sense, where the sensory evidence is evaluated in the context of a generative model of the causal processes giving rise to it. In this conception, vision inverts a generative model through an interrogation of the evidence in a process often thought to involve top-down predictions of sensory data to evaluate the likelihood of alternative hypotheses. The authors include scientists rooted in roughly equal numbers in each of the conceptions and motivated to overcome what might be a false dichotomy between them and engage the other perspective in the realm of theory and experiment. The primate brain employs an unknown algorithm that may combine the advantages of both conceptions. We explain and clarify the terminology, review the key empirical evidence, and propose an empirical research program that transcends the dichotomy and sets the stage for revealing the mysterious hybrid algorithm of primate vision.
- [1847] arXiv:2401.06148 (cross-list from eess.IV) [ pdf , other ]
-
Title: Artificial Intelligence for Digital and Computational PathologyAuthors: Andrew H. Song , Guillaume Jaume , Drew F.K. Williamson , Ming Y. Lu , Anurag Vaidya , Tiffany R. Miller , Faisal MahmoodJournal-ref: Nature Reviews Bioengineering 2023Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Quantitative Methods (q-bio.QM)
Abstract: Advances in digitizing tissue slides and the fast-paced progress in artificial intelligence, including deep learning, have boosted the field of computational pathology. This field holds tremendous potential to automate clinical diagnosis, predict patient prognosis and response to therapy, and discover new morphological biomarkers from tissue images. Some of these artificial intelligence-based systems are now getting approved to assist clinical diagnosis; however, technical barriers remain for their widespread clinical adoption and integration as a research tool. This Review consolidates recent methodological advances in computational pathology for predicting clinical end points in whole-slide images and highlights how these developments enable the automation of clinical practice and the discovery of new biomarkers. We then provide future perspectives as the field expands into a broader range of clinical and research tasks with increasingly diverse modalities of clinical data.
- [1848] arXiv:2401.06150 (cross-list from eess.IV) [ pdf , other ]
-
Title: D-STGCNT: A Dense Spatio-Temporal Graph Conv-GRU Network based on transformer for assessment of patient physical rehabilitationComments: 15 pages, Computers in Biology and Medicine JournalSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: This paper tackles the challenge of automatically assessing physical rehabilitation exercises for patients who perform the exercises without clinician supervision. The objective is to provide a quality score to ensure correct performance and achieve desired results. To achieve this goal, a new graph-based model, the Dense Spatio-Temporal Graph Conv-GRU Network with Transformer, is introduced. This model combines a modified version of STGCN and transformer architectures for efficient handling of spatio-temporal data. The key idea is to consider skeleton data respecting its non-linear structure as a graph and detecting joints playing the main role in each rehabilitation exercise. Dense connections and GRU mechanisms are used to rapidly process large 3D skeleton inputs and effectively model temporal dynamics. The transformer encoder's attention mechanism focuses on relevant parts of the input sequence, making it useful for evaluating rehabilitation exercises. The evaluation of our proposed approach on the KIMORE and UI-PRMD datasets highlighted its potential, surpassing state-of-the-art methods in terms of accuracy and computational time. This resulted in faster and more accurate learning and assessment of rehabilitation exercises. Additionally, our model provides valuable feedback through qualitative illustrations, effectively highlighting the significance of joints in specific exercises.
- [1849] arXiv:2401.06151 (cross-list from q-bio.BM) [ pdf , other ]
-
Title: Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein Complexes with SE(3)-Discrete DiffusionComments: 15 pages, 11 figures, presented at the NeurIPS 2023 Machine Learning in Structural Biology (MLSB) workshop. Code available at this https URLSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Abstract: Generative models of macromolecules carry abundant and impactful implications for industrial and biomedical efforts in protein engineering. However, existing methods are currently limited to modeling protein structures or sequences, independently or jointly, without regard to the interactions that commonly occur between proteins and other macromolecules. In this work, we introduce MMDiff, a generative model that jointly designs sequences and structures of nucleic acid and protein complexes, independently or in complex, using joint SE(3)-discrete diffusion noise. Such a model has important implications for emerging areas of macromolecular design including structure-based transcription factor design and design of noncoding RNA sequences. We demonstrate the utility of MMDiff through a rigorous new design benchmark for macromolecular complex generation that we introduce in this work. Our results demonstrate that MMDiff is able to successfully generate micro-RNA and single-stranded DNA molecules while being modestly capable of joint modeling DNA and RNA molecules in interaction with multi-chain protein complexes. Source code: this https URL .
- [1850] arXiv:2401.06166 (cross-list from q-bio.BM) [ pdf , ps , other ]
-
Title: Adjustable Molecular Representation for Unified Pre-training StrategySubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: We propose a new large-scale molecular model, named AdaMR, which stands for Adjustable Molecular Representation for Unified Pre-training Strategy. Unlike recent large-scale molecular models that use a single molecular encoding, AdaMR employs a granularity-adjustable molecular encoder, learning molecular representations at both the atomic and substructure levels. For the pre-training process, we designed a task for molecular canonicalization, which involves transforming ltiple generic molecular representations into canonical representations. By adjusting the granularity of molecular encoding, the trained model can improve the effects on multiple downstream tasks, such as model attribute prediction and molecule generation. Substructure-level molecular representation retains information of specific atom groups or arrangements that determine chemical properties and have similar functions, which is beneficial for tasks like property prediction. Meanwhile, atomic-level representation, combined with generative molecular canonicalization pre-training tasks, enhances the validity, novelty, and uniqueness in generative tasks. These features of AdaMR demonstrate its strong performance in numerous downstream tasks. We use different molecular properties prediction tasks on six different datasets on MoleculeNet and two generative tasks on ZINC250K dataset to evaluate our proposed molecular encoding and pre-training methods, and obtain state-of-the-art (SOTA) results on five of these tasks.
- [1851] arXiv:2401.06183 (cross-list from eess.AS) [ pdf , other ]
-
Title: End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Speech has long been a barrier to effective communication and connection, persisting as a challenge in our increasingly interconnected world. This research paper introduces a transformative solution to this persistent obstacle an end-to-end speech conversion framework tailored for Hindi-to-English translation, culminating in the synthesis of English audio. By integrating cutting-edge technologies such as XLSR Wav2Vec2 for automatic speech recognition (ASR), mBART for neural machine translation (NMT), and a Text-to-Speech (TTS) synthesis component, this framework offers a unified and seamless approach to cross-lingual communication. We delve into the intricate details of each component, elucidating their individual contributions and exploring the synergies that enable a fluid transition from spoken Hindi to synthesized English audio.
- [1852] arXiv:2401.06199 (cross-list from q-bio.QM) [ pdf , other ]
-
Title: xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of ProteinAuthors: Bo Chen , Xingyi Cheng , Pan Li , Yangli-ao Geng , Jing Gong , Shen Li , Zhilei Bei , Xu Tan , Boyan Wang , Xin Zeng , Chiming Liu , Aohan Zeng , Yuxiao Dong , Jie Tang , Le SongSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework. Our key technical contribution is an exploration of the compatibility and the potential for joint optimization of the two types of objectives, which has led to a strategy for training xTrimoPGLM at an unprecedented scale of 100 billion parameters and 1 trillion training tokens. Our extensive experiments reveal that 1) xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories. The model also facilitates an atomic-resolution view of protein structures, leading to an advanced 3D structural prediction model that surpasses existing language model-based tools. 2) xTrimoPGLM not only can generate de novo protein sequences following the principles of natural ones, but also can perform programmable generation after supervised fine-tuning (SFT) on curated sequences. These results highlight the substantial capability and versatility of xTrimoPGLM in understanding and generating protein sequences, contributing to the evolving landscape of foundation models in protein science.
- [1853] arXiv:2401.06230 (cross-list from physics.geo-ph) [ pdf , other ]
-
Title: WISE: full-Waveform variational Inference via Subsurface ExtensionsSubjects: Geophysics (physics.geo-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP)
Abstract: We introduce a probabilistic technique for full-waveform inversion, employing variational inference and conditional normalizing flows to quantify uncertainty in migration-velocity models and its impact on imaging. Our approach integrates generative artificial intelligence with physics-informed common-image gathers, reducing reliance on accurate initial velocity models. Considered case studies demonstrate its efficacy producing realizations of migration-velocity models conditioned by the data. These models are used to quantify amplitude and positioning effects during subsequent imaging.
- [1854] arXiv:2401.06300 (cross-list from quant-ph) [ pdf , other ]
-
Title: Advantage of Quantum Neural Networks as Quantum Information DecodersComments: 25 pages, 5 figuresSubjects: Quantum Physics (quant-ph) ; Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: A promising strategy to protect quantum information from noise-induced errors is to encode it into the low-energy states of a topological quantum memory device. However, readout errors from such memory under realistic settings is less understood. We study the problem of decoding quantum information encoded in the groundspaces of topological stabilizer Hamiltonians in the presence of generic perturbations, such as quenched disorder. We first prove that the standard stabilizer-based error correction and decoding schemes work adequately well in such perturbed quantum codes by showing that the decoding error diminishes exponentially in the distance of the underlying unperturbed code. We then prove that Quantum Neural Network (QNN) decoders provide an almost quadratic improvement on the readout error. Thus, we demonstrate provable advantage of using QNNs for decoding realistic quantum error-correcting codes, and our result enables the exploration of a wider range of non-stabilizer codes in the near-term laboratory settings.
- [1855] arXiv:2401.06588 (cross-list from eess.AS) [ pdf , other ]
-
Title: Dynamic Behaviour of Connectionist Speech Recognition with Strong Latency ConstraintsAuthors: Giampiero SalviJournal-ref: Speech Communication Volume 48, Issue 7, July 2006, Pages 802-818Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
Abstract: This paper describes the use of connectionist techniques in phonetic speech recognition with strong latency constraints. The constraints are imposed by the task of deriving the lip movements of a synthetic face in real time from the speech signal, by feeding the phonetic string into an articulatory synthesiser. Particular attention has been paid to analysing the interaction between the time evolution model learnt by the multi-layer perceptrons and the transition model imposed by the Viterbi decoder, in different latency conditions. Two experiments were conducted in which the time dependencies in the language model (LM) were controlled by a parameter. The results show a strong interaction between the three factors involved, namely the neural network topology, the length of time dependencies in the LM and the decoder latency.
- [1856] arXiv:2401.06777 (cross-list from eess.IV) [ pdf , other ]
-
Title: Multimodal Neuroimaging Attention-Based architecture for Cognitive Decline PredictionSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI)
Abstract: The early detection of Alzheimer's Disease is imperative to ensure early treatment and improve patient outcomes. There has consequently been extenstive research into detecting AD and its intermediate phase, mild cognitive impairment (MCI). However, there is very small literature in predicting the conversion to AD and MCI from normal cognitive condition. Recently, multiple studies have applied convolutional neural networks (CNN) which integrate Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) to classify MCI and AD. However, in these works, the fusion of MRI and PET features are simply achieved through concatenation, resulting in a lack of cross-modal interactions. In this paper, we propose a novel multimodal neuroimaging attention-based CNN architecture, MNA-net, to predict whether cognitively normal (CN) individuals will develop MCI or AD within a period of 10 years. To address the lack of interactions across neuroimaging modalities seen in previous works, MNA-net utilises attention mechanisms to form shared representations of the MRI and PET images. The proposed MNA-net is tested in OASIS-3 dataset and is able to predict CN individuals who converted to MCI or AD with an accuracy of 83%, true negative rate of 80%, and true positive rate of 86%. The new state of the art results improved by 5% and 10% for accuracy and true negative rate by the use of attention mechanism. These results demonstrate the potential of the proposed model to predict cognitive impairment and attention based mechanisms in the fusion of different neuroimaging modalities to improve the prediction of cognitive decline.
- [1857] arXiv:2401.06780 (cross-list from eess.IV) [ pdf , other ]
-
Title: HA-HI: Synergising fMRI and DTI through Hierarchical Alignments and Hierarchical Interactions for Mild Cognitive Impairment DiagnosisAuthors: Xiongri Shen , Zhenxi Song , Linling Li , Min Zhang , Lingyan Liang Honghai Liu , Demao Deng , Zhiguo ZhangSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: Early diagnosis of mild cognitive impairment (MCI) and subjective cognitive decline (SCD) utilizing multi-modal magnetic resonance imaging (MRI) is a pivotal area of research. While various regional and connectivity features from functional MRI (fMRI) and diffusion tensor imaging (DTI) have been employed to develop diagnosis models, most studies integrate these features without adequately addressing their alignment and interactions. This limits the potential to fully exploit the synergistic contributions of combined features and modalities. To solve this gap, our study introduces a novel Hierarchical Alignments and Hierarchical Interactions (HA-HI) method for MCI and SCD classification, leveraging the combined strengths of fMRI and DTI. HA-HI efficiently learns significant MCI- or SCD- related regional and connectivity features by aligning various feature types and hierarchically maximizing their interactions. Furthermore, to enhance the interpretability of our approach, we have developed the Synergistic Activation Map (SAM) technique, revealing the critical brain regions and connections that are indicative of MCI/SCD. Comprehensive evaluations on the ADNI dataset and our self-collected data demonstrate that HA-HI outperforms other existing methods in diagnosing MCI and SCD, making it a potentially vital and interpretable tool for early detection. The implementation of this method is publicly accessible at this https URL .
- [1858] arXiv:2401.06788 (cross-list from eess.AS) [ pdf , other ]
-
Title: The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023Comments: Included in CNVSRC Workshop 2023, NCMMSC 2023Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task. In terms of data processing, we leverage the lip motion extractor from the baseline1 to produce multi-scale video data. Besides, various augmentation techniques are applied during training, encompassing speed perturbation, random rotation, horizontal flipping, and color transformation. The VSR model adopts an end-to-end architecture with joint CTC/attention loss, comprising a ResNet3D visual frontend, an E-Branchformer encoder, and a Transformer decoder. Experiments show that our system achieves 34.76% CER for the Single-Speaker Task and 41.06% CER for the Multi-Speaker Task after multi-system fusion, ranking first place in all three tracks we participate.
- [1859] arXiv:2401.07043 (cross-list from quant-ph) [ pdf , other ]
-
Title: Quantum Advantage Actor-Critic for Reinforcement LearningAuthors: Michael Kölle , Mohamad Hgog , Fabian Ritz , Philipp Altmann , Maximilian Zorn , Jonas Stein , Claudia Linnhoff-PopienComments: Accepted at ICAART 24Subjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Quantum computing offers efficient encapsulation of high-dimensional states. In this work, we propose a novel quantum reinforcement learning approach that combines the Advantage Actor-Critic algorithm with variational quantum circuits by substituting parts of the classical components. This approach addresses reinforcement learning's scalability concerns while maintaining high performance. We empirically test multiple quantum Advantage Actor-Critic configurations with the well known Cart Pole environment to evaluate our approach in control tasks with continuous state spaces. Our results indicate that the hybrid strategy of using either a quantum actor or quantum critic with classical post-processing yields a substantial performance increase compared to pure classical and pure quantum variants with similar parameter counts. They further reveal the limits of current quantum approaches due to the hardware constraints of noisy intermediate-scale quantum computers, suggesting further research to scale hybrid approaches for larger and more complex control tasks.
- [1860] arXiv:2401.07054 (cross-list from quant-ph) [ pdf , other ]
-
Title: A Reinforcement Learning Environment for Directed Quantum Circuit SynthesisAuthors: Michael Kölle , Tom Schubert , Philipp Altmann , Maximilian Zorn , Jonas Stein , Claudia Linnhoff-PopienSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI)
Abstract: With recent advancements in quantum computing technology, optimizing quantum circuits and ensuring reliable quantum state preparation have become increasingly vital. Traditional methods often demand extensive expertise and manual calculations, posing challenges as quantum circuits grow in qubit- and gate-count. Therefore, harnessing machine learning techniques to handle the growing variety of gate-to-qubit combinations is a promising approach. In this work, we introduce a comprehensive reinforcement learning environment for quantum circuit synthesis, where circuits are constructed utilizing gates from the the Clifford+T gate set to prepare specific target states. Our experiments focus on exploring the relationship between the depth of synthesized quantum circuits and the circuit depths used for target initialization, as well as qubit count. We organize the environment configurations into multiple evaluation levels and include a range of well-known quantum states for benchmarking purposes. We also lay baselines for evaluating the environment using Proximal Policy Optimization. By applying the trained agents to benchmark tests, we demonstrated their ability to reliably design minimal quantum circuits for a selection of 2-qubit Bell states.
- [1861] arXiv:2401.07336 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Construction and Evaluation of Mandarin Multimodal Emotional Speech DatabaseAuthors: Zhu Ting , Li Liangqi , Duan Shufei , Zhang Xueying , Xiao Zhongzhe , Jia Hairng , Liang HuizhiSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
Abstract: A multi-modal emotional speech Mandarin database including articulatory kinematics, acoustics, glottal and facial micro-expressions is designed and established, which is described in detail from the aspects of corpus design, subject selection, recording details and data processing. Where signals are labeled with discrete emotion labels (neutral, happy, pleasant, indifferent, angry, sad, grief) and dimensional emotion labels (pleasure, arousal, dominance). In this paper, the validity of dimension annotation is verified by statistical analysis of dimension annotation data. The SCL-90 scale data of annotators are verified and combined with PAD annotation data for analysis, so as to explore the internal relationship between the outlier phenomenon in annotation and the psychological state of annotators. In order to verify the speech quality and emotion discrimination of the database, this paper uses 3 basic models of SVM, CNN and DNN to calculate the recognition rate of these seven emotions. The results show that the average recognition rate of seven emotions is about 82% when using acoustic data alone. When using glottal data alone, the average recognition rate is about 72%. Using kinematics data alone, the average recognition rate also reaches 55.7%. Therefore, the database is of high quality and can be used as an important source for speech analysis research, especially for the task of multimodal emotional speech analysis.
- [1862] arXiv:2401.07379 (cross-list from q-bio.QM) [ pdf , other ]
-
Title: Inference of dynamical gene regulatory networks from single-cell data with physics informed neural networksComments: 25 pages, 8 figuresSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biological Physics (physics.bio-ph); Molecular Networks (q-bio.MN)
Abstract: One of the main goals of developmental biology is to reveal the gene regulatory networks (GRNs) underlying the robust differentiation of multipotent progenitors into precisely specified cell types. Most existing methods to infer GRNs from experimental data have limited predictive power as the inferred GRNs merely reflect gene expression similarity or correlation. Here, we demonstrate, how physics-informed neural networks (PINNs) can be used to infer the parameters of predictive, dynamical GRNs that provide mechanistic understanding of biological processes. Specifically we study GRNs that exhibit bifurcation behavior and can therefore model cell differentiation. We show that PINNs outperform regular feed-forward neural networks on the parameter inference task and analyze two relevant experimental scenarios: 1. a system with cell communication for which gene expression trajectories are available and 2. snapshot measurements of a cell population in which cell communication is absent. Our analysis will inform the design of future experiments to be analyzed with PINNs and provides a starting point to explore this powerful class of neural network models further.
- [1863] arXiv:2401.07489 (cross-list from physics.flu-dyn) [ pdf , other ]
-
Title: The Principle of Minimum Pressure Gradient: An Alternative Basis for Physics-Informed Learning of Incompressible Fluid MechanicsJournal-ref: AIP Advances. 14 (2024) 045112Subjects: Fluid Dynamics (physics.flu-dyn) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in the application of physics-informed learning into the field of fluid mechanics have been predominantly grounded in the Newtonian framework, primarly leveraging Navier-Stokes Equation or one of its various derivative to train a neural network. Here, we propose an alternative approach based on variational methods. The proposed approach uses the principle of minimum pressure gradient combined with the continuity constraint to train a neural network and predict the flow field in incompressible fluids. We describe the underlying principles of the proposed approach, then use a demonstrative example to illustrate its implementation and show that it reduces the computational time per training epoch when compared to the conventional approach.
- [1864] arXiv:2401.07969 (cross-list from nlin.CG) [ pdf , other ]
-
Title: Simulated Autopoiesis in Liquid AutomataAuthors: Steve BattleComments: 12 pagesSubjects: Cellular Automata and Lattice Gases (nlin.CG) ; Artificial Intelligence (cs.AI)
Abstract: We present a novel form of Liquid Automata, using this to simulate autopoiesis, whereby living machines self-organise in the physical realm. This simulation is based on an earlier Cellular Automaton described by Francisco Varela. The basis of Liquid Automata is a particle simulation with additional rules about how particles are transformed on collision with other particles. Unlike cellular automata, there is no fixed grid or time-step, only particles moving about and colliding with each other in a continuous space/time.
- [1865] arXiv:2401.08268 (cross-list from eess.AS) [ pdf , other ]
-
Title: An Explainable Proxy Model for Multiabel Audio SegmentationComments: Accepted at ICASSP 2024Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
Abstract: Audio signal segmentation is a key task for automatic audio indexing. It consists of detecting the boundaries of class-homogeneous segments in the signal. In many applications, explainable AI is a vital process for transparency of decision-making with machine learning. In this paper, we propose an explainable multilabel segmentation model that solves speech activity (SAD), music (MD), noise (ND), and overlapped speech detection (OSD) simultaneously. This proxy uses the non-negative matrix factorization (NMF) to map the embedding used for the segmentation to the frequency domain. Experiments conducted on two datasets show similar performances as the pre-trained black box model while showing strong explainability features. Specifically, the frequency bins used for the decision can be easily identified at both the segment level (local explanations) and global level (class prototypes).
- [1866] arXiv:2401.08585 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: From Conceptual Spaces to Quantum Concepts: Formalising and Learning Structured Conceptual ModelsComments: This article consolidates our previous reports on concept formalisation and learning: arXiv:2302.14822 and arXiv:2203.11216Subjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI); Quantum Physics (quant-ph)
Abstract: In this article we present a new modelling framework for structured concepts using a category-theoretic generalisation of conceptual spaces, and show how the conceptual representations can be learned automatically from data, using two very different instantiations: one classical and one quantum. A contribution of the work is a thorough category-theoretic formalisation of our framework. We claim that the use of category theory, and in particular the use of string diagrams to describe quantum processes, helps elucidate some of the most important features of our approach. We build upon Gardenfors' classical framework of conceptual spaces, in which cognition is modelled geometrically through the use of convex spaces, which in turn factorise in terms of simpler spaces called domains. We show how concepts from the domains of shape, colour, size and position can be learned from images of simple shapes, where concepts are represented as Gaussians in the classical implementation, and quantum effects in the quantum one. In the classical case we develop a new model which is inspired by the Beta-VAE model of concepts, but is designed to be more closely connected with language, so that the names of concepts form part of the graphical model. In the quantum case, concepts are learned by a hybrid classical-quantum network trained to perform concept classification, where the classical image processing is carried out by a convolutional neural network and the quantum representations are produced by a parameterised quantum circuit. Finally, we consider the question of whether our quantum models of concepts can be considered conceptual spaces in the Gardenfors sense.
- [1867] arXiv:2401.08699 (cross-list from eess.IV) [ pdf , ps , other ]
-
Title: On Image Search in HistopathologyComments: A chapter in the Book "Artificial INtelligence in Digital Pathology" by Cohen and Chauhan, 2024Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Quantitative Methods (q-bio.QM)
Abstract: Pathology images of histopathology can be acquired from camera-mounted microscopes or whole slide scanners. Utilizing similarity calculations to match patients based on these images holds significant potential in research and clinical contexts. Recent advancements in search technologies allow for implicit quantification of tissue morphology across diverse primary sites, facilitating comparisons and enabling inferences about diagnosis, and potentially prognosis, and predictions for new patients when compared against a curated database of diagnosed and treated cases. In this paper, we comprehensively review the latest developments in image search technologies for histopathology, offering a concise overview tailored for computational pathology researchers seeking effective, fast and efficient image search methods in their work.
- [1868] arXiv:2401.09019 (cross-list from eess.IV) [ pdf , other ]
-
Title: Change Detection Between Optical Remote Sensing Imagery and Map Data via Segment Anything Model (SAM)Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
Abstract: Unsupervised multimodal change detection is pivotal for time-sensitive tasks and comprehensive multi-temporal Earth monitoring. In this study, we explore unsupervised multimodal change detection between two key remote sensing data sources: optical high-resolution imagery and OpenStreetMap (OSM) data. Specifically, we propose to utilize the vision foundation model Segmentation Anything Model (SAM), for addressing our task. Leveraging SAM's exceptional zero-shot transfer capability, high-quality segmentation maps of optical images can be obtained. Thus, we can directly compare these two heterogeneous data forms in the so-called segmentation domain. We then introduce two strategies for guiding SAM's segmentation process: the 'no-prompt' and 'box/mask prompt' methods. The two strategies are designed to detect land-cover changes in general scenarios and to identify new land-cover objects within existing backgrounds, respectively. Experimental results on three datasets indicate that the proposed approach can achieve more competitive results compared to representative unsupervised multimodal change detection methods.
- [1869] arXiv:2401.09354 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Transcending Controlled Environments Assessing the Transferability of ASRRobust NLU Models to Real-World ApplicationsSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: This research investigates the transferability of Automatic Speech Recognition (ASR)-robust Natural Language Understanding (NLU) models from controlled experimental conditions to practical, real-world applications. Focused on smart home automation commands in Urdu, the study assesses model performance under diverse noise profiles, linguistic variations, and ASR error scenarios. Leveraging the UrduBERT model, the research employs a systematic methodology involving real-world data collection, cross-validation, transfer learning, noise variation studies, and domain adaptation. Evaluation metrics encompass task-specific accuracy, latency, user satisfaction, and robustness to ASR errors. The findings contribute insights into the challenges and adaptability of ASR-robust NLU models in transcending controlled environments.
- [1870] arXiv:2401.09424 (cross-list from physics.ao-ph) [ pdf , other ]
-
Title: Precipitation Prediction Using an Ensemble of Lightweight LearnersSubjects: Atmospheric and Oceanic Physics (physics.ao-ph) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Precipitation prediction plays a crucial role in modern agriculture and industry. However, it poses significant challenges due to the diverse patterns and dynamics in time and space, as well as the scarcity of high precipitation events.
To address this challenge, we propose an ensemble learning framework that leverages multiple learners to capture the diverse patterns of precipitation distribution. Specifically, the framework consists of a precipitation predictor with multiple lightweight heads (learners) and a controller that combines the outputs from these heads. The learners and the controller are separately optimized with a proposed 3-stage training scheme.
By utilizing provided satellite images, the proposed approach can effectively model the intricate rainfall patterns, especially for high precipitation events. It achieved 1st place on the core test as well as the nowcasting leaderboards of the Weather4Cast 2023 competition. For detailed implementation, please refer to our GitHub repository at: this https URL . - [1871] arXiv:2401.09435 (cross-list from math.ST) [ pdf , other ]
-
Title: Reasoning with random sets: An agenda for the futureAuthors: Fabio CuzzolinComments: 94 pages, 17 figuresSubjects: Statistics Theory (math.ST) ; Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Abstract: In this paper, we discuss a potential agenda for future work in the theory of random sets and belief functions, touching upon a number of focal issues: the development of a fully-fledged theory of statistical reasoning with random sets, including the generalisation of logistic regression and of the classical laws of probability; the further development of the geometric approach to uncertainty, to include general random sets, a wider range of uncertainty measures and alternative geometric representations; the application of this new theory to high-impact areas such as climate change, machine learning and statistical learning theory.
- [1872] arXiv:2401.09451 (cross-list from q-bio.BM) [ pdf , other ]
-
Title: Diffusion-Driven Generative Framework for Molecular Conformation PredictionComments: arXiv admin note: text overlap with arXiv:2105.07246 by other authorsSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Abstract: The task of deducing three-dimensional molecular configurations from their two-dimensional graph representations holds paramount importance in the fields of computational chemistry and pharmaceutical development. The rapid advancement of machine learning, particularly within the domain of deep generative networks, has revolutionized the precision of predictive modeling in this context. Traditional approaches often adopt a two-step strategy: initially estimating interatomic distances and subsequently refining the spatial molecular structure by solving a distance geometry problem. However, this sequential approach occasionally falls short in accurately capturing the intricacies of local atomic arrangements, thereby compromising the fidelity of the resulting structural models. Addressing these limitations, this research introduces a cutting-edge generative framework named \method{}. This framework is grounded in the principles of diffusion observed in classical non-equilibrium thermodynamics. \method{} views atoms as discrete entities and excels in guiding the reversal of diffusion, transforming a distribution of stochastic noise back into coherent molecular structures through a process akin to a Markov chain. This transformation commences with the initial representation of a molecular graph in an abstract latent space, culminating in the realization of three-dimensional structures via a sophisticated bilevel optimization scheme meticulously tailored to meet the specific requirements of the task. One of the formidable challenges in this modeling endeavor involves preserving roto-translational invariance to ensure that the generated molecular conformations adhere to the laws of physics. Extensive experimental evaluations confirm the efficacy of the proposed \method{} in comparison to state-of-the-art methods.
- [1873] arXiv:2401.09466 (cross-list from physics.ao-ph) [ pdf , other ]
-
Title: Self Supervised Vision for Climate DownscalingAuthors: Karandeep Singh , Chaeyoon Jeong , Naufal Shidqi , Sungwon Park , Arjun Nellikkattil , Elke Zeller , Meeyoung ChaSubjects: Atmospheric and Oceanic Physics (physics.ao-ph) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Climate change is one of the most critical challenges that our planet is facing today. Rising global temperatures are already bringing noticeable changes to Earth's weather and climate patterns with an increased frequency of unpredictable and extreme weather events. Future projections for climate change research are based on Earth System Models (ESMs), the computer models that simulate the Earth's climate system. ESMs provide a framework to integrate various physical systems, but their output is bound by the enormous computational resources required for running and archiving higher-resolution simulations. For a given resource budget, the ESMs are generally run on a coarser grid, followed by a computationally lighter $downscaling$ process to obtain a finer-resolution output. In this work, we present a deep-learning model for downscaling ESM simulation data that does not require high-resolution ground truth data for model optimization. This is realized by leveraging salient data distribution patterns and the hidden dependencies between weather variables for an $\textit{individual}$ data point at $\textit{runtime}$. Extensive evaluation with $2$x, $3$x, and $4$x scaling factors demonstrates that the proposed model consistently obtains superior performance over that of various baselines. The improved downscaling performance and no dependence on high-resolution ground truth data make the proposed method a valuable tool for climate research and mark it as a promising direction for future research.
- [1874] arXiv:2401.09556 (cross-list from math.OC) [ pdf , ps , other ]
-
Title: Deep learning enhanced mixed integer optimization: Learning to reduce model dimensionalitySubjects: Optimization and Control (math.OC) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This work introduces a framework to address the computational complexity inherent in Mixed-Integer Programming (MIP) models by harnessing the potential of deep learning. We compare the effectiveness of (a) feed-forward neural networks (ANN) and (b) convolutional neural networks (CNN) in approximating the active dimensions within MIP problems. We utilize multi-label classification to account for more than one active dimension. To enhance the framework's performance, we employ Bayesian optimization for hyperparameter tuning, aiming to maximize sample-level accuracy. The primary objective is to train the neural networks to predict all active dimensions accurately, thereby maximizing the occurrence of global optimum solutions. We apply this framework to a flow-based facility location allocation Mixed-Integer Linear Programming (MILP) formulation that describes long-term investment planning and medium-term tactical planning in a personalized medicine supply chain for cell therapy manufacturing and distribution.
- [1875] arXiv:2401.09717 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Parameter Selection for Analyzing Conversations with Autism Spectrum DisorderComments: 5 pages, 4 tables, Proceedings of INTERSPEECH 2023Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: The diagnosis of autism spectrum disorder (ASD) is a complex, challenging task as it depends on the analysis of interactional behaviors by psychologists rather than the use of biochemical diagnostics. In this paper, we present a modeling approach to ASD diagnosis by analyzing acoustic/prosodic and linguistic features extracted from diagnostic conversations between a psychologist and children who either are typically developing (TD) or have ASD. We compare the contributions of different features across a range of conversation tasks. We focus on finding a minimal set of parameters that characterize conversational behaviors of children with ASD. Because ASD is diagnosed through conversational interaction, in addition to analyzing the behavior of the children, we also investigate whether the psychologist's conversational behaviors vary across diagnostic groups. Our results can facilitate fine-grained analysis of conversation data for children with ASD to support diagnosis and intervention.
- [1876] arXiv:2401.09833 (cross-list from eess.IV) [ pdf , other ]
-
Title: Slicer NetworksComments: 8 figures and 3 tablesSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: In medical imaging, scans often reveal objects with varied contrasts but consistent internal intensities or textures. This characteristic enables the use of low-frequency approximations for tasks such as segmentation and deformation field estimation. Yet, integrating this concept into neural network architectures for medical image analysis remains underexplored. In this paper, we propose the Slicer Network, a novel architecture designed to leverage these traits. Comprising an encoder utilizing models like vision transformers for feature extraction and a slicer employing a learnable bilateral grid, the Slicer Network strategically refines and upsamples feature maps via a splatting-blurring-slicing process. This introduces an edge-preserving low-frequency approximation for the network outcome, effectively enlarging the effective receptive field. The enhancement not only reduces computational complexity but also boosts overall performance. Experiments across different medical imaging applications, including unsupervised and keypoints-based image registration and lesion segmentation, have verified the Slicer Network's improved accuracy and efficiency.
- [1877] arXiv:2401.10032 (cross-list from eess.AS) [ pdf , other ]
-
Title: FreGrad: Lightweight and Fast Frequency-aware Diffusion VocoderComments: Accepted to ICASSP 2024Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Abstract: The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated convolution that elevates frequency awareness, resulting in generating speech with accurate frequency information, and (3) We introduce a bag of tricks that boosts the generation quality of the proposed model. In our experiments, FreGrad achieves 3.7 times faster training time and 2.2 times faster inference speed compared to our baseline while reducing the model size by 0.6 times (only 1.78M parameters) without sacrificing the output quality. Audio samples are available at: this https URL .
- [1878] arXiv:2401.10211 (cross-list from q-bio.QM) [ pdf , other ]
-
Title: Improving PTM Site Prediction by Coupling of Multi-Granularity Structure and Multi-Scale Sequence RepresentationSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Protein post-translational modification (PTM) site prediction is a fundamental task in bioinformatics. Several computational methods have been developed to predict PTM sites. However, existing methods ignore the structure information and merely utilize protein sequences. Furthermore, designing a more fine-grained structure representation learning method is urgently needed as PTM is a biological event that occurs at the atom granularity. In this paper, we propose a PTM site prediction method by Coupling of Multi-Granularity structure and Multi-Scale sequence representation, PTM-CMGMS for brevity. Specifically, multigranularity structure-aware representation learning is designed to learn neighborhood structure representations at the amino acid, atom, and whole protein granularity from AlphaFold predicted structures, followed by utilizing contrastive learning to optimize the structure representations.Additionally, multi-scale sequence representation learning is used to extract context sequence information, and motif generated by aligning all context sequences of PTM sites assists the prediction. Extensive experiments on three datasets show that PTM-CMGMS outperforms the state-of-the-art methods.
- [1879] arXiv:2401.10278 (cross-list from eess.SP) [ pdf , other ]
-
Title: EEGFormer: Towards Transferable and Interpretable Large-Scale EEG Foundation ModelComments: A preprint version of an ongoing workSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neurons and Cognition (q-bio.NC)
Abstract: Self-supervised learning has emerged as a highly effective approach in the fields of natural language processing and computer vision. It is also applicable to brain signals such as electroencephalography (EEG) data, given the abundance of available unlabeled data that exist in a wide spectrum of real-world medical applications ranging from seizure detection to wave analysis. The existing works leveraging self-supervised learning on EEG modeling mainly focus on pretraining upon each individual dataset corresponding to a single downstream task, which cannot leverage the power of abundant data, and they may derive sub-optimal solutions with a lack of generalization. Moreover, these methods rely on end-to-end model learning which is not easy for humans to understand. In this paper, we present a novel EEG foundation model, namely EEGFormer, pretrained on large-scale compound EEG data. The pretrained model cannot only learn universal representations on EEG signals with adaptable performance on various downstream tasks but also provide interpretable outcomes of the useful patterns within the data. To validate the effectiveness of our model, we extensively evaluate it on various downstream tasks and assess the performance under different transfer settings. Furthermore, we demonstrate how the learned model exhibits transferable anomaly detection performance and provides valuable interpretability of the acquired patterns via self-supervised learning.
- [1880] arXiv:2401.10280 (cross-list from eess.SP) [ pdf , other ]
-
Title: GANs for EVT Based Model Parameter Estimation in Real-time Ultra-Reliable CommunicationSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Abstract: The Ultra-Reliable Low-Latency Communications (URLLC) paradigm in sixth-generation (6G) systems heavily relies on precise channel modeling, especially when dealing with rare and extreme events within wireless communication channels. This paper explores a novel methodology integrating Extreme Value Theory (EVT) and Generative Adversarial Networks (GANs) to achieve the precise channel modeling in real-time. The proposed approach harnesses EVT by employing the Generalized Pareto Distribution (GPD) to model the distribution of extreme events. Subsequently, Generative Adversarial Networks (GANs) are employed to estimate the parameters of the GPD. In contrast to conventional GAN configurations that focus on estimating the overall distribution, the proposed approach involves the incorporation of an additional block within the GAN structure. This specific augmentation is designed with the explicit purpose of directly estimating the parameters of the Generalized Pareto Distribution (GPD). Through extensive simulations across different sample sizes, the proposed GAN based approach consistently demonstrates superior adaptability, surpassing Maximum Likelihood Estimation (MLE), particularly in scenarios with limited sample sizes.
- [1881] arXiv:2401.10282 (cross-list from eess.SP) [ pdf , other ]
-
Title: BioDiffusion: A Versatile Diffusion Model for Biomedical Signal SynthesisSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Machine learning tasks involving biomedical signals frequently grapple with issues such as limited data availability, imbalanced datasets, labeling complexities, and the interference of measurement noise. These challenges often hinder the optimal training of machine learning algorithms. Addressing these concerns, we introduce BioDiffusion, a diffusion-based probabilistic model optimized for the synthesis of multivariate biomedical signals. BioDiffusion demonstrates excellence in producing high-fidelity, non-stationary, multivariate signals for a range of tasks including unconditional, label-conditional, and signal-conditional generation. Leveraging these synthesized signals offers a notable solution to the aforementioned challenges. Our research encompasses both qualitative and quantitative assessments of the synthesized data quality, underscoring its capacity to bolster accuracy in machine learning tasks tied to biomedical signals. Furthermore, when juxtaposed with current leading time-series generative models, empirical evidence suggests that BioDiffusion outperforms them in biomedical signal generation quality.
- [1882] arXiv:2401.10284 (cross-list from eess.SP) [ pdf , other ]
-
Title: MorpheusNet: Resource efficient sleep stage classifier for embedded on-line systemsAuthors: Ali Kavoosi , Morgan P. Mitchell , Raveen Kariyawasam , John E. Fleming , Penny Lewis , Heidi Johansen-Berg , Hayriye Cagnan , Timothy DenisonComments: This paper was presented at the 2023 IEEE conference on Systems, Man, and Cybernetics (SMC)Subjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Sleep Stage Classification (SSC) is a labor-intensive task, requiring experts to examine hours of electrophysiological recordings for manual classification. This is a limiting factor when it comes to leveraging sleep stages for therapeutic purposes. With increasing affordability and expansion of wearable devices, automating SSC may enable deployment of sleep-based therapies at scale. Deep Learning has gained increasing attention as a potential method to automate this process. Previous research has shown accuracy comparable to manual expert scores. However, previous approaches require sizable amount of memory and computational resources. This constrains the ability to classify in real time and deploy models on the edge. To address this gap, we aim to provide a model capable of predicting sleep stages in real-time, without requiring access to external computational sources (e.g., mobile phone, cloud). The algorithm is power efficient to enable use on embedded battery powered systems. Our compact sleep stage classifier can be deployed on most off-the-shelf microcontrollers (MCU) with constrained hardware settings. This is due to the memory footprint of our approach requiring significantly fewer operations. The model was tested on three publicly available data bases and achieved performance comparable to the state of the art, whilst reducing model complexity by orders of magnitude (up to 280 times smaller compared to state of the art). We further optimized the model with quantization of parameters to 8 bits with only an average drop of 0.95% in accuracy. When implemented in firmware, the quantized model achieves a latency of 1.6 seconds on an Arm CortexM4 processor, allowing its use for on-line SSC-based therapies.
- [1883] arXiv:2401.10334 (cross-list from q-bio.QM) [ pdf , other ]
-
Title: DrugAssist: A Large Language Model for Molecule OptimizationAuthors: Geyan Ye , Xibao Cai , Houtim Lai , Xing Wang , Junhong Huang , Longyue Wang , Wei Liu , Xiangxiang ZengComments: Geyan Ye and Xibao Cai are equal contributors; Longyue Wang is corresponding authorSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Recently, the impressive performance of large language models (LLMs) on a wide range of tasks has attracted an increasing number of attempts to apply LLMs in drug discovery. However, molecule optimization, a critical task in the drug discovery pipeline, is currently an area that has seen little involvement from LLMs. Most of existing approaches focus solely on capturing the underlying patterns in chemical structures provided by the data, without taking advantage of expert feedback. These non-interactive approaches overlook the fact that the drug discovery process is actually one that requires the integration of expert experience and iterative refinement. To address this gap, we propose DrugAssist, an interactive molecule optimization model which performs optimization through human-machine dialogue by leveraging LLM's strong interactivity and generalizability. DrugAssist has achieved leading results in both single and multiple property optimization, simultaneously showcasing immense potential in transferability and iterative optimization. In addition, we publicly release a large instruction-based dataset called MolOpt-Instructions for fine-tuning language models on molecule optimization tasks. We have made our code and data publicly available at this https URL , which we hope to pave the way for future research in LLMs' application for drug discovery.
- [1884] arXiv:2401.10348 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: Exploring General Intelligence via Gated Graph Transformer in Functional Connectivity StudiesAuthors: Gang Qu , Anton Orlichenko , Junqi Wang , Gemeng Zhang , Li Xiao , Aiying Zhang , Zhengming Ding , Yu-Ping WangSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI)
Abstract: Functional connectivity (FC) as derived from fMRI has emerged as a pivotal tool in elucidating the intricacies of various psychiatric disorders and delineating the neural pathways that underpin cognitive and behavioral dynamics inherent to the human brain. While Graph Neural Networks (GNNs) offer a structured approach to represent neuroimaging data, they are limited by their need for a predefined graph structure to depict associations between brain regions, a detail not solely provided by FCs. To bridge this gap, we introduce the Gated Graph Transformer (GGT) framework, designed to predict cognitive metrics based on FCs. Empirical validation on the Philadelphia Neurodevelopmental Cohort (PNC) underscores the superior predictive prowess of our model, further accentuating its potential in identifying pivotal neural connectivities that correlate with human cognitive processes.
- [1885] arXiv:2401.10746 (cross-list from eess.SP) [ pdf , other ]
-
Title: A Systematic Evaluation of Euclidean Alignment with Deep Learning for EEG DecodingComments: 14 pages and 10 figuresSubjects: Signal Processing (eess.SP) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Electroencephalography (EEG) signals are frequently used for various Brain-Computer Interface (BCI) tasks. While Deep Learning (DL) techniques have shown promising results, they are hindered by the substantial data requirements. By leveraging data from multiple subjects, transfer learning enables more effective training of DL models. A technique that is gaining popularity is Euclidean Alignment (EA) due to its ease of use, low computational complexity, and compatibility with Deep Learning models. However, few studies evaluate its impact on the training performance of shared and individual DL models. In this work, we systematically evaluate the effect of EA combined with DL for decoding BCI signals. We used EA to train shared models with data from multiple subjects and evaluated its transferability to new subjects. Our experimental results show that it improves decoding in the target subject by 4.33% and decreases convergence time by more than 70%. We also trained individual models for each subject to use as a majority-voting ensemble classifier. In this scenario, using EA improved the 3-model ensemble accuracy by 3.7%. However, when compared to the shared model with EA, the ensemble accuracy was 3.62% lower.
- [1886] arXiv:2401.10910 (cross-list from q-bio.NC) [ pdf , other ]
-
Title: Metacognition is all you need? Using Introspection in Generative Agents to Improve Goal-directed BehaviorComments: 9 pages, 4 figuresSubjects: Neurons and Cognition (q-bio.NC) ; Artificial Intelligence (cs.AI)
Abstract: Recent advances in Large Language Models (LLMs) have shown impressive capabilities in various applications, yet LLMs face challenges such as limited context windows and difficulties in generalization. In this paper, we introduce a metacognition module for generative agents, enabling them to observe their own thought processes and actions. This metacognitive approach, designed to emulate System 1 and System 2 cognitive processes, allows agents to significantly enhance their performance by modifying their strategy. We tested the metacognition module on a variety of scenarios, including a situation where generative agents must survive a zombie apocalypse, and observe that our system outperform others, while agents adapt and improve their strategies to complete tasks over time.
- [1887] arXiv:2401.10937 (cross-list from econ.TH) [ pdf , ps , other ]
-
Title: Subjective CausalitySubjects: Theoretical Economics (econ.TH) ; Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)
Abstract: We show that it is possible to understand and identify a decision maker's subjective causal judgements by observing her preferences over interventions. Following Pearl [2000], we represent causality using causal models (also called structural equations models), where the world is described by a collection of variables, related by equations. We show that if a preference relation over interventions satisfies certain axioms (related to standard axioms regarding counterfactuals), then we can define (i) a causal model, (ii) a probability capturing the decision-maker's uncertainty regarding the external factors in the world and (iii) a utility on outcomes such that each intervention is associated with an expected utility and such that intervention $A$ is preferred to $B$ iff the expected utility of $A$ is greater than that of $B$. In addition, we characterize when the causal model is unique. Thus, our results allow a modeler to test the hypothesis that a decision maker's preferences are consistent with some causal model and to identify causal judgements from observed behavior.
- [1888] arXiv:2401.11351 (cross-list from quant-ph) [ pdf , other ]
-
Title: A comprehensive review of Quantum Machine Learning: from NISQ to Fault ToleranceComments: 28 pages. Invited reviewSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Abstract: Quantum machine learning, which involves running machine learning algorithms on quantum devices, has garnered significant attention in both academic and business circles. In this paper, we offer a comprehensive and unbiased review of the various concepts that have emerged in the field of quantum machine learning. This includes techniques used in Noisy Intermediate-Scale Quantum (NISQ) technologies and approaches for algorithms compatible with fault-tolerant quantum computing hardware. Our review covers fundamental concepts, algorithms, and the statistical learning theory pertinent to quantum machine learning.
- [1889] arXiv:2401.11665 (cross-list from stat.ML) [ pdf , other ]
-
Title: Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte CarloComments: 52 pages, 2 figures, to appear in AISTATS 2024Subjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Approximate Thompson sampling with Langevin Monte Carlo broadens its reach from Gaussian posterior sampling to encompass more general smooth posteriors. However, it still encounters scalability issues in high-dimensional problems when demanding high accuracy. To address this, we propose an approximate Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where the latter is the go-to workhorse for simulations of high-dimensional posteriors. Based on the standard smoothness and log-concavity conditions, we study the accelerated posterior concentration and sampling using a specific potential function. This design improves the sample complexity for realizing logarithmic regrets from $\mathcal{\tilde O}(d)$ to $\mathcal{\tilde O}(\sqrt{d})$. The scalability and robustness of our algorithm are also empirically validated through synthetic experiments in high-dimensional bandit problems.
- [1890] arXiv:2401.12167 (cross-list from eess.IV) [ pdf , other ]
-
Title: Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based AutoencoderComments: arXiv admin note: text overlap with arXiv:2211.13745Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: This paper studies the computational offloading of CNN inference in dynamic multi-access edge computing (MEC) networks. To address the uncertainties in communication time and computation resource availability, we propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN), for effective semantic extraction and compression in partial offloading. In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features. In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy. To effectively trade-off communication, computation, and inference accuracy, we design a reward function and formulate the offloading problem of CNN inference as a maximization problem with the goal of maximizing the average inference accuracy and throughput over the long term. To address this maximization problem, we propose a graph reinforcement learning-based AECNN (GRL-AECNN) method, which outperforms existing works DROO-AECNN, GRL-BottleNet++ and GRL-DeepJSCC under different dynamic scenarios. This highlights the advantages of GRL-AECNN in offloading decision-making in dynamic MEC.
- [1891] arXiv:2401.12203 (cross-list from astro-ph.IM) [ pdf , other ]
-
Title: Unsupervised Machine Learning for the Classification of Astrophysical X-ray SourcesAuthors: Víctor Samuel Pérez-Díaz , Juan Rafael Martínez-Galarza , Alexander Caicedo , Raffaele D'AbruscoComments: 21 pages, 11 figures. Accepted in MNRASSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM) ; Artificial Intelligence (cs.AI)
Abstract: The automatic classification of X-ray detections is a necessary step in extracting astrophysical information from compiled catalogs of astrophysical sources. Classification is useful for the study of individual objects, statistics for population studies, as well as for anomaly detection, i.e., the identification of new unexplored phenomena, including transients and spectrally extreme sources. Despite the importance of this task, classification remains challenging in X-ray astronomy due to the lack of optical counterparts and representative training sets. We develop an alternative methodology that employs an unsupervised machine learning approach to provide probabilistic classes to Chandra Source Catalog sources with a limited number of labeled sources, and without ancillary information from optical and infrared catalogs. We provide a catalog of probabilistic classes for 8,756 sources, comprising a total of 14,507 detections, and demonstrate the success of the method at identifying emission from young stellar objects, as well as distinguishing between small-scale and large-scale compact accretors with a significant level of confidence. We investigate the consistency between the distribution of features among classified objects and well-established astrophysical hypotheses such as the unified AGN model. This provides interpretability to the probabilistic classifier. Code and tables are available publicly through GitHub. We provide a web playground for readers to explore our final classification at https://umlcaxs-playground.streamlit.app.
- [1892] arXiv:2401.12329 (cross-list from physics.soc-ph) [ pdf , ps , other ]
-
Title: Towards a prioritised use of transportation infrastructures: the case of vehicle-specific dynamic access restrictions to city centresAuthors: Holger Billhardt , Alberto Fernández , Pasqual Martí , Javier Prieto Tejedor , Sascha OssowskiJournal-ref: Electronics, Volume 11, Issue 4 (2022)Subjects: Physics and Society (physics.soc-ph) ; Artificial Intelligence (cs.AI)
Abstract: One of the main problems that local authorities of large cities have to face is the regulation of urban mobility. They need to provide the means to allow for the efficient movement of people and distribution of goods. However, the provisioning of transportation services needs to take into account general global objectives, like reducing emissions and having more healthy living environments, which may not always be aligned with individual interests. Urban mobility is usually provided through a transport infrastructure that includes all the elements that support mobility. On many occasions, the capacity of the elements of this infrastructure is lower than the actual demand and thus different transportation activities compete for their use. In this paper, we argue that scarce transport infrastructure elements should be assigned dynamically and in a prioritised manner to transport activities that have a higher utility from the point of view of society; for example, activities that produce less pollution and provide more value to society. In this paper, we define a general model for prioritizing the use of a particular type of transportation infrastructure element called time-unlimited elements, whose usage time is unknown a priori, and illustrate its dynamics through two use cases: vehicle-specific dynamic access restriction in city centres (i) based on the usage levels of available parking spaces and (ii) to assure sustained admissible air quality levels in the city centre. We carry out several experiments using the SUMO traffic simulation tool to evaluate our proposal.
- [1893] arXiv:2401.12570 (cross-list from eess.AS) [ pdf , other ]
-
Title: DiffMoog: a Differentiable Modular Synthesizer for Sound MatchingAuthors: Noy Uzrad , Oren Barkan , Almog Elharar , Shlomi Shvartzman , Moshe Laufer , Lior Wolf , Noam KoenigsteinComments: 5 pages, 7 figures, 1 table, Our code is released at this https URLSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: This paper presents DiffMoog - a differentiable modular synthesizer with a comprehensive set of modules typically found in commercial instruments. Being differentiable, it allows integration into neural networks, enabling automated sound matching, to replicate a given audio input. Notably, DiffMoog facilitates modulation capabilities (FM/AM), low-frequency oscillators (LFOs), filters, envelope shapers, and the ability for users to create custom signal chains. We introduce an open-source platform that comprises DiffMoog and an end-to-end sound matching framework. This framework utilizes a novel signal-chain loss and an encoder network that self-programs its outputs to predict DiffMoogs parameters based on the user-defined modular architecture. Moreover, we provide insights and lessons learned towards sound matching using differentiable synthesis. Combining robust sound capabilities with a holistic platform, DiffMoog stands as a premier asset for expediting research in audio synthesis and machine learning.
- [1894] arXiv:2401.12771 (cross-list from eess.IV) [ pdf , ps , other ]
-
Title: Deep Learning-based Intraoperative MRI ReconstructionAuthors: Jon André Ottesen , Tryggve Storas , Svein Are Sirirud Vatnehol , Grethe Løvland , Einar O. Vik-Mo , Till Schellhorn , Karoline Skogen , Christopher Larsson , Atle Bjørnerud , Inge Rasmus Groote-Eindbaas , Matthan W.A. CaanSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Medical Physics (physics.med-ph)
Abstract: Purpose: To evaluate the quality of deep learning reconstruction for prospectively accelerated intraoperative magnetic resonance imaging (iMRI) during resective brain tumor surgery.
Materials and Methods: Accelerated iMRI was performed during brain surgery using dual surface coils positioned around the area of resection. A deep learning (DL) model was trained on the fastMRI neuro dataset to mimic the data from the iMRI protocol. Evaluation was performed on imaging material from 40 patients imaged between 01.11.2021 - 01.06.2023 that underwent iMRI during tumor resection surgery. A comparative analysis was conducted between the conventional compressed sense (CS) method and the trained DL reconstruction method. Blinded evaluation of multiple image quality metrics was performed by two working neuro-radiologists and a working neurosurgeon on a 1 to 5 Likert scale (1=non diagnostic, 2=poor, 3=acceptable, 4=good, 5=excellent), and the favored reconstruction variant.
Results: The DL reconstruction was strongly favored or favored over the CS reconstruction for 33/40, 39/40, and 8/40 of cases for reader 1, 2, and 3, respectively. Two of three readers consistently assigned higher ratings for the DL reconstructions, and the DL reconstructions had a higher score than their respective CS counterparts for 72%, 72%, and 14% of the cases for reader 1, 2, and 3, respectively. Still, the DL reconstructions exhibited shortcomings such as a striping artifact and reduced signal.
Conclusion: DL shows promise to allow for high-quality reconstructions of intraoperative MRI with equal to or improved perceived spatial resolution, signal-to-noise ratio, diagnostic confidence, diagnostic conspicuity, and spatial resolution compared to compressed sense. - [1895] arXiv:2401.12850 (cross-list from eess.AS) [ pdf , other ]
-
Title: Overlap-aware End-to-End Supervised Hierarchical Graph Clustering for Speaker DiarizationComments: 10 pagesSubjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: Speaker diarization, the task of segmenting an audio recording based on speaker identity, constitutes an important speech pre-processing step for several downstream applications. The conventional approach to diarization involves multiple steps of embedding extraction and clustering, which are often optimized in an isolated fashion. While end-to-end diarization systems attempt to learn a single model for the task, they are often cumbersome to train and require large supervised datasets. In this paper, we propose an end-to-end supervised hierarchical clustering algorithm based on graph neural networks (GNN), called End-to-end Supervised HierARchical Clustering (E-SHARC). The E-SHARC approach uses front-end mel-filterbank features as input and jointly learns an embedding extractor and the GNN clustering module, performing representation learning, metric learning, and clustering with end-to-end optimization. Further, with additional inputs from an external overlap detector, the E-SHARC approach is capable of predicting the speakers in the overlapping speech regions. The experimental evaluation on several benchmark datasets like AMI, VoxConverse and DISPLACE, illustrates that the proposed E-SHARC framework improves significantly over the state-of-art diarization systems.
- [1896] arXiv:2401.12999 (cross-list from physics.chem-ph) [ pdf , other ]
-
Title: Quantum-Inspired Machine Learning for Molecular DockingAuthors: Runqiu Shu , Bowen Liu , Zhaoping Xiong , Xiaopeng Cui , Yunting Li , Wei Cui , Man-Hong Yung , Nan QiaoSubjects: Chemical Physics (physics.chem-ph) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Molecular docking is an important tool for structure-based drug design, accelerating the efficiency of drug development. Complex and dynamic binding processes between proteins and small molecules require searching and sampling over a wide spatial range. Traditional docking by searching for possible binding sites and conformations is computationally complex and results poorly under blind docking. Quantum-inspired algorithms combining quantum properties and annealing show great advantages in solving combinatorial optimization problems. Inspired by this, we achieve an improved in blind docking by using quantum-inspired combined with gradients learned by deep learning in the encoded molecular space. Numerical simulation shows that our method outperforms traditional docking algorithms and deep learning-based algorithms over 10\%. Compared to the current state-of-the-art deep learning-based docking algorithm DiffDock, the success rate of Top-1 (RMSD<2) achieves an improvement from 33\% to 35\% in our same setup. In particular, a 6\% improvement is realized in the high-precision region(RMSD<1) on molecules data unseen in DiffDock, which demonstrates the well-generalized of our method.
- [1897] arXiv:2401.13049 (cross-list from eess.IV) [ pdf , other ]
-
Title: CIS-UNet: Multi-Class Segmentation of the Aorta in Computed Tomography Angiography via Context-Aware Shifted Window Self-AttentionAuthors: Muhammad Imran , Jonathan R Krebs , Veera Rajasekhar Reddy Gopu , Brian Fazzone , Vishal Balaji Sivaraman , Amarjeet Kumar , Chelsea Viscardi , Robert Evans Heithaus , Benjamin Shickel , Yuyin Zhou , Michol A Cooper , Wei ShaoSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Abstract: Advancements in medical imaging and endovascular grafting have facilitated minimally invasive treatments for aortic diseases. Accurate 3D segmentation of the aorta and its branches is crucial for interventions, as inaccurate segmentation can lead to erroneous surgical planning and endograft construction. Previous methods simplified aortic segmentation as a binary image segmentation problem, overlooking the necessity of distinguishing between individual aortic branches. In this paper, we introduce Context Infused Swin-UNet (CIS-UNet), a deep learning model designed for multi-class segmentation of the aorta and thirteen aortic branches. Combining the strengths of Convolutional Neural Networks (CNNs) and Swin transformers, CIS-UNet adopts a hierarchical encoder-decoder structure comprising a CNN encoder, symmetric decoder, skip connections, and a novel Context-aware Shifted Window Self-Attention (CSW-SA) as the bottleneck block. Notably, CSW-SA introduces a unique utilization of the patch merging layer, distinct from conventional Swin transformers. It efficiently condenses the feature map, providing a global spatial context and enhancing performance when applied at the bottleneck layer, offering superior computational efficiency and segmentation accuracy compared to the Swin transformers. We trained our model on computed tomography (CT) scans from 44 patients and tested it on 15 patients. CIS-UNet outperformed the state-of-the-art SwinUNetR segmentation model, which is solely based on Swin transformers, by achieving a superior mean Dice coefficient of 0.713 compared to 0.697, and a mean surface distance of 2.78 mm compared to 3.39 mm. CIS-UNet's superior 3D aortic segmentation offers improved precision and optimization for planning endovascular treatments. Our dataset and code will be publicly available.
- [1898] arXiv:2401.13219 (cross-list from q-bio.GN) [ pdf , other ]
-
Title: TEPI: Taxonomy-aware Embedding and Pseudo-Imaging for Scarcely-labeled Zero-shot Genome ClassificationAuthors: Sathyanarayanan Aakur , Vishalini R. Laguduva , Priyadharsini Ramamurthy , Akhilesh RamachandranComments: Accepted to IEEE JBHISubjects: Genomics (q-bio.GN) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: A species' genetic code or genome encodes valuable evolutionary, biological, and phylogenetic information that aids in species recognition, taxonomic classification, and understanding genetic predispositions like drug resistance and virulence. However, the vast number of potential species poses significant challenges in developing a general-purpose whole genome classification tool. Traditional bioinformatics tools have made notable progress but lack scalability and are computationally expensive. Machine learning-based frameworks show promise but must address the issue of large classification vocabularies with long-tail distributions. In this study, we propose addressing this problem through zero-shot learning using TEPI, Taxonomy-aware Embedding and Pseudo-Imaging. We represent each genome as pseudo-images and map them to a taxonomy-aware embedding space for reasoning and classification. This embedding space captures compositional and phylogenetic relationships of species, enabling predictions in extensive search spaces. We evaluate TEPI using two rigorous zero-shot settings and demonstrate its generalization capabilities qualitatively on curated, large-scale, publicly sourced data.
- [1899] arXiv:2401.13315 (cross-list from eess.IV) [ pdf , other ]
-
Title: Deep Learning for Improved Polyp Detection from Synthetic Narrow-Band ImagingSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Abstract: To cope with the growing prevalence of colorectal cancer (CRC), screening programs for polyp detection and removal have proven their usefulness. Colonoscopy is considered the best-performing procedure for CRC screening. To ease the examination, deep learning based methods for automatic polyp detection have been developed for conventional white-light imaging (WLI). Compared with WLI, narrow-band imaging (NBI) can improve polyp classification during colonoscopy but requires special equipment. We propose a CycleGAN-based framework to convert images captured with regular WLI to synthetic NBI (SNBI) as a pre-processing method for improving object detection on WLI when NBI is unavailable. This paper first shows that better results for polyp detection can be achieved on NBI compared to a relatively similar dataset of WLI. Secondly, experimental results demonstrate that our proposed modality translation can achieve improved polyp detection on SNBI images generated from WLI compared to the original WLI. This is because our WLI-to-SNBI translation model can enhance the observation of polyp surface patterns in the generated SNBI images.
- [1900] arXiv:2401.13335 (cross-list from stat.ML) [ pdf , other ]
-
Title: Full Bayesian Significance Testing for Neural NetworksComments: Published as a conference paper at AAAI 2024Subjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Significance testing aims to determine whether a proposition about the population distribution is the truth or not given observations. However, traditional significance testing often needs to derive the distribution of the testing statistic, failing to deal with complex nonlinear relationships. In this paper, we propose to conduct Full Bayesian Significance Testing for neural networks, called \textit{n}FBST, to overcome the limitation in relationship characterization of traditional approaches. A Bayesian neural network is utilized to fit the nonlinear and multi-dimensional relationships with small errors and avoid hard theoretical derivation by computing the evidence value. Besides, \textit{n}FBST can test not only global significance but also local and instance-wise significance, which previous testing methods don't focus on. Moreover, \textit{n}FBST is a general framework that can be extended based on the measures selected, such as Grad-\textit{n}FBST, LRP-\textit{n}FBST, DeepLIFT-\textit{n}FBST, LIME-\textit{n}FBST. A range of experiments on both simulated and real data are conducted to show the advantages of our method.
- [1901] arXiv:2401.13703 (cross-list from math.HO) [ pdf , other ]
-
Title: Solving Some Geometry Problems of the Náboj 2023 Contest with Automated Deduction in GeoGebra DiscoveryAuthors: Amela Hota (The Private University College of Education of the Diocese of Linz, Austria), Zoltán Kovács (The Private University College of Education of the Diocese of Linz, Austria), Alexander Vujic (The Private University College of Education of the Diocese of Linz, Austria)Comments: In Proceedings ADG 2023, arXiv:2401.10725Journal-ref: EPTCS 398, 2024, pp. 110-123Subjects: History and Overview (math.HO) ; Artificial Intelligence (cs.AI); Computational Geometry (cs.CG); Symbolic Computation (cs.SC)
Abstract: In this article, we solve some of the geometry problems of the Náboj 2023 competition with the help of a computer, using examples that the software tool GeoGebra Discovery can calculate. In each case, the calculation requires symbolic computations. We analyze the difficulty of feeding the problem into the machine and set further goals to make the problems of this type of contests even more tractable in the future.
- [1902] arXiv:2401.13758 (cross-list from math.ST) [ pdf , other ]
-
Title: Assumptions and Bounds in the Instrumental Variable ModelComments: 27 pages, 1 figure, 1 table. Proofs of Theorems 1 and 2 stated in Richardson and Robins (2014) [ arXiv:1410.0470 ]. v2 improves the writing in a few placesSubjects: Statistics Theory (math.ST) ; Artificial Intelligence (cs.AI)
Abstract: In this note we give proofs for results relating to the Instrumental Variable (IV) model with binary response $Y$ and binary treatment $X$, but with an instrument $Z$ with $K$ states. These results were originally stated in Richardson & Robins (2014), "ACE Bounds; SEMS with Equilibrium Conditions," arXiv:1410.0470 .
- [1903] arXiv:2401.14089 (cross-list from quant-ph) [ pdf , other ]
-
Title: GQHAN: A Grover-inspired Quantum Hard Attention NetworkSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI)
Abstract: Numerous current Quantum Machine Learning (QML) models exhibit an inadequacy in discerning the significance of quantum data, resulting in diminished efficacy when handling extensive quantum datasets. Hard Attention Mechanism (HAM), anticipated to efficiently tackle the above QML bottlenecks, encounters the substantial challenge of non-differentiability, consequently constraining its extensive applicability. In response to the dilemma of HAM and QML, a Grover-inspired Quantum Hard Attention Mechanism (GQHAM) consisting of a Flexible Oracle (FO) and an Adaptive Diffusion Operator (ADO) is proposed. Notably, the FO is designed to surmount the non-differentiable issue by executing the activation or masking of Discrete Primitives (DPs) with Flexible Control (FC) to weave various discrete destinies. Based on this, such discrete choice can be visualized with a specially defined Quantum Hard Attention Score (QHAS). Furthermore, a trainable ADO is devised to boost the generality and flexibility of GQHAM. At last, a Grover-inspired Quantum Hard Attention Network (GQHAN) based on QGHAM is constructed on PennyLane platform for Fashion MNIST binary classification. Experimental findings demonstrate that GQHAN adeptly surmounts the non-differentiability hurdle, surpassing the efficacy of extant quantum soft self-attention mechanisms in accuracies and learning ability. In noise experiments, GQHAN is robuster to bit-flip noise in accuracy and amplitude damping noise in learning performance. Predictably, the proposal of GQHAN enriches the Quantum Attention Mechanism (QAM), lays the foundation for future quantum computers to process large-scale data, and promotes the development of quantum computer vision.
- [1904] arXiv:2401.14171 (cross-list from eess.IV) [ pdf , other ]
-
Title: Predicting Hypoxia in Brain Tumors from Multiparametric MRIComments: 7 pages, 2 figuresSubjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI)
Abstract: This research paper presents a novel approach to the prediction of hypoxia in brain tumors, using multi-parametric Magnetic Resonance Imaging (MRI). Hypoxia, a condition characterized by low oxygen levels, is a common feature of malignant brain tumors associated with poor prognosis. Fluoromisonidazole Positron Emission Tomography (FMISO PET) is a well-established method for detecting hypoxia in vivo, but it is expensive and not widely available. Our study proposes the use of MRI, a more accessible and cost-effective imaging modality, to predict FMISO PET signals. We investigate deep learning models (DL) trained on the ACRIN 6684 dataset, a resource that contains paired MRI and FMISO PET images from patients with brain tumors. Our trained models effectively learn the complex relationships between the MRI features and the corresponding FMISO PET signals, thereby enabling the prediction of hypoxia from MRI scans alone. The results show a strong correlation between the predicted and actual FMISO PET signals, with an overall PSNR score above 29.6 and a SSIM score greater than 0.94, confirming MRI as a promising option for hypoxia prediction in brain tumors. This approach could significantly improve the accessibility of hypoxia detection in clinical settings, with the potential for more timely and targeted treatments.
- [1905] arXiv:2401.14206 (cross-list from eess.IV) [ pdf , other ]
-
Title: Exploiting Liver CT scans in Colorectal Carcinoma genomics mutation classificationAuthors: Daniele Perlo , Luca Berton , Alessia Delpiano , Francesca Menchini , Stefano Tibaldi , Marco Grosso , Paolo FonioJournal-ref: 2022 IEEE International Conference on Big Data (Big Data)Subjects: Image and Video Processing (eess.IV) ; Artificial Intelligence (cs.AI)
Abstract: The liver is the most involved organ by distant metastasis in colon-rectal cancer (CRC) patients and it comes necessary to be aware of the mutational status of the lesions to correctly design the best individual treatment. So far, efforts have been made in order to develop non-invasive and real-time methods that permit the analysis of the whole tumor, using new artificial intelligence tools to analyze the tumor's image obtained by Computed Tomography (CT) scan. In order to address the current medical workflow, that is biopsy analysis-based, we propose the first DeepLearning-based exploration, to our knowledge, of such classification approach from the patient medical imaging. We propose i) a solid pipeline for managing undersized datasets of available CT scans and ii) a baseline study for genomics mutation diagnosis support for preemptive patient follow-up. Our method is able to identify CRC RAS mutation family from CT images with 0.73 F1 score.
- [1906] arXiv:2401.14665 (cross-list from q-bio.BM) [ pdf , other ]
-
Title: PepGB: Facilitating peptide drug discovery via graph neural networksSubjects: Biomolecules (q-bio.BM) ; Artificial Intelligence (cs.AI)
Abstract: Peptides offer great biomedical potential and serve as promising drug candidates. Currently, the majority of approved peptide drugs are directly derived from well-explored natural human peptides. It is quite necessary to utilize advanced deep learning techniques to identify novel peptide drugs in the vast, unexplored biochemical space. Despite various in silico methods having been developed to accelerate peptide early drug discovery, existing models face challenges of overfitting and lacking generalizability due to the limited size, imbalanced distribution and inconsistent quality of experimental data. In this study, we propose PepGB, a deep learning framework to facilitate peptide early drug discovery by predicting peptide-protein interactions (PepPIs). Employing graph neural networks, PepGB incorporates a fine-grained perturbation module and a dual-view objective with contrastive learning-based peptide pre-trained representation to predict PepPIs. Through rigorous evaluations, we demonstrated that PepGB greatly outperforms baselines and can accurately identify PepPIs for novel targets and peptide hits, thereby contributing to the target identification and hit discovery processes. Next, we derive an extended version, diPepGB, to tackle the bottleneck of modeling highly imbalanced data prevalent in lead generation and optimization processes. Utilizing directed edges to represent relative binding strength between two peptide nodes, diPepGB achieves superior performance in real-world assays. In summary, our proposed frameworks can serve as potent tools to facilitate peptide early drug discovery.
- [1907] arXiv:2401.15801 (cross-list from stat.ML) [ pdf , ps , other ]
-
Title: On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data DimensionSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Statistics Theory (math.ST)
Abstract: Despite the remarkable empirical successes of Generative Adversarial Networks (GANs), the theoretical guarantees for their statistical accuracy remain rather pessimistic. In particular, the data distributions on which GANs are applied, such as natural images, are often hypothesized to have an intrinsic low-dimensional structure in a typically high-dimensional feature space, but this is often not reflected in the derived rates in the state-of-the-art analyses. In this paper, we attempt to bridge the gap between the theory and practice of GANs and their bidirectional variant, Bi-directional GANs (BiGANs), by deriving statistical guarantees on the estimated densities in terms of the intrinsic dimension of the data and the latent space. We analytically show that if one has access to $n$ samples from the unknown target distribution and the network architectures are properly chosen, the expected Wasserstein-1 distance of the estimates from the target scales as $O\left( n^{-1/d_\mu } \right)$ for GANs and $O\left( n^{-1/(d_\mu+\ell)} \right)$ for BiGANs, where $d_\mu$ and $\ell$ are the upper Wasserstein-1 dimension of the data-distribution and latent-space dimension, respectively. The theoretical analyses not only suggest that these methods successfully avoid the curse of dimensionality, in the sense that the exponent of $n$ in the error rates does not depend on the data dimension but also serve to bridge the gap between the theoretical analyses of GANs and the known sharp rates from optimal transport literature. Additionally, we demonstrate that GANs can effectively achieve the minimax optimal rate even for non-smooth underlying distributions, with the use of larger generator networks.
- [1908] arXiv:2401.15889 (cross-list from stat.ML) [ pdf , other ]
-
Title: Sliced Wasserstein with Random-Path Projecting DirectionsComments: 27 pages, 3 figures, 1 tableSubjects: Machine Learning (stat.ML) ; Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Abstract: Slicing distribution selection has been used as an effective technique to improve the performance of parameter estimators based on minimizing sliced Wasserstein distance in applications. Previous works either utilize expensive optimization to select the slicing distribution or use slicing distributions that require expensive sampling methods. In this work, we propose an optimization-free slicing distribution that provides a fast sampling for the Monte Carlo estimation of expectation. In particular, we introduce the random-path projecting direction (RPD) which is constructed by leveraging the normalized difference between two random vectors following the two input measures. From the RPD, we derive the random-path slicing distribution (RPSD) and two variants of sliced Wasserstein, i.e., the Random-Path Projection Sliced Wasserstein (RPSW) and the Importance Weighted Random-Path Projection Sliced Wasserstein (IWRPSW). We then discuss the topological, statistical, and computational properties of RPSW and IWRPSW. Finally, we showcase the favorable performance of RPSW and IWRPSW in gradient flow and the training of denoising diffusion generative models on images.
- [1909] arXiv:2401.16190 (cross-list from q-bio.QM) [ pdf , ps , other ]
-
Title: AI prediction of cardiovascular events using opportunistic epicardial adipose tissue assessments from CT calcium scoreAuthors: Tao Hu , Joshua Freeze , Prerna Singh , Justin Kim , Yingnan Song , Hao Wu , Juhwan Lee , Sadeer Al-Kindi , Sanjay Rajagopalan , David L. Wilson , Ammar HooriComments: 7 pages, 1 central illustration, 6 figures, 5 tablesSubjects: Quantitative Methods (q-bio.QM) ; Artificial Intelligence (cs.AI)
Abstract: Background: Recent studies have used basic epicardial adipose tissue (EAT) assessments (e.g., volume and mean HU) to predict risk of atherosclerosis-related, major adverse cardiovascular events (MACE). Objectives: Create novel, hand-crafted EAT features, 'fat-omics', to capture the pathophysiology of EAT and improve MACE prediction. Methods: We segmented EAT using a previously-validated deep learning method with optional manual correction. We extracted 148 radiomic features (morphological, spatial, and intensity) and used Cox elastic-net for feature reduction and prediction of MACE. Results: Traditional fat features gave marginal prediction (EAT-volume/EAT-mean-HU/ BMI gave C-index 0.53/0.55/0.57, respectively). Significant improvement was obtained with 15 fat-omics features (C-index=0.69, test set). High-risk features included volume-of-voxels-having-elevated-HU-[-50, -30-HU] and HU-negative-skewness, both of which assess high HU, which as been implicated in fat inflammation. Other high-risk features include kurtosis-of-EAT-thickness, reflecting the heterogeneity of thicknesses, and EAT-volume-in-the-top-25%-of-the-heart, emphasizing adipose near the proximal coronary arteries. Kaplan-Meyer plots of Cox-identified, high- and low-risk patients were well separated with the median of the fat-omics risk, while high-risk group having HR 2.4 times that of the low-risk group (P<0.001). Conclusion: Preliminary findings indicate an opportunity to use more finely tuned, explainable assessments on EAT for improved cardiovascular risk prediction.
- [1910] arXiv:2401.16458 (cross-list from q-fin.RM) [ pdf , other ]
-
Title: Credit Risk Meets Large Language Models: Building a Risk Indicator from Loan Descriptions in P2P LendingSubjects: Risk Management (q-fin.RM) ; Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Abstract: Peer-to-peer (P2P) lending has emerged as a distinctive financing mechanism, linking borrowers with lenders through online platforms. However, P2P lending faces the challenge of information asymmetry, as lenders often lack sufficient data to assess the creditworthiness of borrowers. This paper proposes a novel approach to address this issue by leveraging the textual descriptions provided by borrowers during the loan application process. Our methodology involves processing these textual descriptions using a Large Language Model (LLM), a powerful tool capable of discerning patterns and semantics within the text. Transfer learning is applied to adapt the LLM to the specific task at hand.
Our results derived from the analysis of the Lending Club dataset show that the risk score generated by BERT, a widely used LLM, significantly improves the performance of credit risk classifiers. However, the inherent opacity of LLM-based systems, coupled with uncertainties about potential biases, underscores critical considerations for regulatory frameworks and engenders trust-related concerns among end-users, opening new avenues for future research in the dynamic landscape of P2P lending and artificial intelligence. - [1911] arXiv:2401.17424 (cross-list from astro-ph.HE) [ pdf , other ]
-
Title: Application of Neural Networks for the Reconstruction of Supernova Neutrino Energy Spectra Following Fast Neutrino Flavor ConversionsComments: 13 pages, 6 figures, submitted to PRD. arXiv admin note: text overlap with arXiv:2311.15656Subjects: High Energy Astrophysical Phenomena (astro-ph.HE) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract: Neutrinos can undergo fast flavor conversions (FFCs) within extremely dense astrophysical environments such as core-collapse supernovae (CCSNe) and neutron star mergers (NSMs). In this study, we explore FFCs in a \emph{multi-energy} neutrino gas, revealing that when the FFC growth rate significantly exceeds that of the vacuum Hamiltonian, all neutrinos (regardless of energy) share a common survival probability dictated by the energy-integrated neutrino spectrum. We then employ physics-informed neural networks (PINNs) to predict the asymptotic outcomes of FFCs within such a multi-energy neutrino gas. These predictions are based on the first two moments of neutrino angular distributions for each energy bin, typically available in state-of-the-art CCSN and NSM simulations. Our PINNs achieve errors as low as $\lesssim6\%$ and $\lesssim 18\%$ for predicting the number of neutrinos in the electron channel and the relative absolute error in the neutrino moments, respectively.
- [1912] arXiv:2401.17455 (cross-list from physics.soc-ph) [ pdf , other ]
-
Title: Multiscale Parallel Tempering for Fast Sampling on Redistricting PlansComments: 26 pages with appendix; 11 figuresSubjects: Physics and Society (physics.soc-ph) ; Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI); Probability (math.PR)
Abstract: When auditing a redistricting plan, a persuasive method is to compare the plan with an ensemble of neutrally drawn redistricting plans. Ensembles are generated via algorithms that sample distributions on balanced graph partitions. To audit the partisan difference between the ensemble and a given plan, one must ensure that the non-partisan criteria are matched so that we may conclude that partisan differences come from bias rather than, for example, levels of compactness or differences in community preservation. Certain sampling algorithms allow one to explicitly state the policy-based probability distribution on plans, however, these algorithms have shown poor mixing times for large graphs (i.e. redistricting spaces) for all but a few specialized measures. In this work, we generate a multiscale parallel tempering approach that makes local moves at each scale. The local moves allow us to adopt a wide variety of policy-based measures. We examine our method in the state of Connecticut and succeed at achieving fast mixing on a policy-based distribution that has never before been sampled at this scale. Our algorithm shows promise to expand to a significantly wider class of measures that will (i) allow for more principled and situation-based comparisons and (ii) probe for the typical partisan impact that policy can have on redistricting.
- [1913] arXiv:2401.17513 (cross-list from physics.bio-ph) [ pdf , other ]
-
Title: A PNP ion channel deep learning solver with local neural network and finite element input dataComments: 17 pages, 4 figures, 5 tablesSubjects: Biological Physics (physics.bio-ph) ; Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
Abstract: In this paper, a deep learning method for solving an improved one-dimensional Poisson-Nernst-Planck ion channel (PNPic) model, called the PNPic deep learning solver, is presented. In particular, it combines a novel local neural network scheme with an effective PNPic finite element solver. Since the input data of the neural network scheme only involves a small local patch of coarse grid solutions, which the finite element solver can quickly produce, the PNPic deep learning solver can be trained much faster than any corresponding conventional global neural network solvers. After properly trained, it can output a predicted PNPic solution in a much higher degree of accuracy than the low cost coarse grid solutions and can reflect different perturbation cases on the parameters, ion channel subregions, and interface and boundary values, etc. Consequently, the PNPic deep learning solver can generate a numerical solution with high accuracy for a family of PNPic models. As an initial study, two types of numerical tests were done by perturbing one and two parameters of the PNPic model, respectively, as well as the tests done by using a few perturbed interface positions of the model as training samples. These tests demonstrate that the PNPic deep learning solver can generate highly accurate PNPic numerical solutions.
- [1914] arXiv:2401.17690 (cross-list from eess.AS) [ pdf , other ]
-
Title: EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio CaptioningComments: Accepted to ICASSP 2024Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Sound (cs.SD)
Abstract: We propose EnCLAP, a novel framework for automated audio captioning. EnCLAP employs two acoustic representation models, EnCodec and CLAP, along with a pretrained language model, BART. We also introduce a new training objective called masked codec modeling that improves acoustic awareness of the pretrained language model. Experimental results on AudioCaps and Clotho demonstrate that our model surpasses the performance of baseline models. Source code will be available at this https URL . An online demo is available at this https URL .
- [1915] arXiv:2401.17976 (cross-list from quant-ph) [ pdf , other ]
-
Title: Circuit Partitioning for Multi-Core Quantum Architectures with Deep Reinforcement LearningAuthors: Arnau Pastor , Pau Escofet , Sahar Ben Rached , Eduard Alarcón , Pere Barlet-Ros , Sergi AbadalSubjects: Quantum Physics (quant-ph) ; Artificial Intelligence (cs.AI)
Abstract: Quantum computing holds immense potential for solving classically intractable problems by leveraging the unique properties of quantum mechanics. The scalability of quantum architectures remains a significant challenge. Multi-core quantum architectures are proposed to solve the scalability problem, arising a new set of challenges in hardware, communications and compilation, among others. One of these challenges is to adapt a quantum algorithm to fit within the different cores of the quantum computer. This paper presents a novel approach for circuit partitioning using Deep Reinforcement Learning, contributing to the advancement of both quantum computing and graph partitioning. This work is the first step in integrating Deep Reinforcement Learning techniques into Quantum Circuit Mapping, opening the door to a new paradigm of solutions to such problems.
- [1916] arXiv:2401.15018 (cross-list from eess.AS) [ pdf , ps , other ]
-
Title: Enhancement of a Text-Independent Speaker Verification System by using Feature Combination and Parallel-Structure ClassifiersJournal-ref: Neural Computing and Applications 29 (2018) 637-651Subjects: Audio and Speech Processing (eess.AS) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
Abstract: Speaker Verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients (MFCC), their first and second derivatives (Deltas and Delta- Deltas), Bark Frequency Cepstral Coefficients (BFCC), Perceptual Linear Predictive (PLP), and Relative Spectral Transform - Perceptual Linear Predictive (RASTA-PLP). In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional Support Vector Machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: Linear kernel and Gaussian Radial Basis Function (RBF) kernel with a Logistic Regression (LR) classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed.
- [1917] arXiv:2401.15324 (cross-list from hep-ex) [ pdf , other ]
-
Title: Neutrino Reconstruction in TRIDENT Based on Graph Neural NetworkJournal-ref: Intelligent Computers, Algorithms, and Applications. IC 2023. Communications in Computer and Information Science, vol 2036Subjects: High Energy Physics - Experiment (hep-ex) ; Artificial Intelligence (cs.AI)
Abstract: TRopIcal DEep-sea Neutrino Telescope (TRIDENT) is a next-generation neutrino telescope to be located in the South China Sea. With a large detector volume and the use of advanced hybrid digital optical modules (hDOMs), TRIDENT aims to discover multiple astrophysical neutrino sources and probe all-flavor neutrino physics. The reconstruction resolution of primary neutrinos is on the critical path to these scientific goals. We have developed a novel reconstruction method based on graph neural network (GNN) for TRIDENT. In this paper, we present the reconstruction performance of the GNN-based approach on both track- and shower-like neutrino events in TRIDENT.
[ showing 1917 entries per page: fewer | more ]
Disable MathJax ( What is MathJax? )
Links to: arXiv , form interface , find , cs , 2404 , contact , h elp ( Access key information)